ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
218 stars 147 forks source link

Stream-K Batch #1861

Closed AlexBrownAMD closed 9 months ago

AlexBrownAMD commented 9 months ago

Batch support for stream-k kernels. Added some new test cases. Small fix for rocBLAS liblogic converter script to work with old liblogic files that don't have xf32 parameter listed.

nakajee commented 9 months ago

Does this work with non strided batched cases?

nakajee commented 9 months ago

Would you create a separate PR for the new tool? Also, please put some brief comment about how to use it.

AlexBrownAMD commented 9 months ago

Does this work with non strided batched cases?

General batch is not supported, just strided batch or non-batch

nakajee commented 9 months ago

Does this work with non strided batched cases?

General batch is not supported, just strided batch or non-batch

In that case, I think it is better to add a reject condition for general batch case.

AlexBrownAMD commented 9 months ago

Would you create a separate PR for the new tool? Also, please put some brief comment about how to use it.

tuning scripts moved to PR 1865

AlexBrownAMD commented 9 months ago

Does this work with non strided batched cases?

General batch is not supported, just strided batch or non-batch

In that case, I think it is better to add a reject condition for general batch case.

Added reject condition

nakajee commented 9 months ago

The code change looks OK. Do you know why extended CI is not triggered? I would like to make sure the test cases you added passes.

AlexBrownAMD commented 9 months ago

The code change looks OK. Do you know why extended CI is not triggered? I would like to make sure the test cases you added passes.

Not sure why it didn't start auomtaically - just started a run manually.