ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
208 stars 142 forks source link

Analytical grid size prediction model for Stream-K #1929

Closed AlexBrownAMD closed 2 months ago

AlexBrownAMD commented 2 months ago

This change integrates the analytical grid size prediction model developed by @neoblizz . The model is meant to predict optimal launch grid-size for a given problem when running stream-k based on the time it takes to complete various steps such as data parallel tiles, partial tiles, workspace writes, and fixup step coordination.

The new mode can be enabled by setting TENSILE_STREAMK_DYNAMIC_GRID=3 and is currently experimental. A future update may need to adjust the prediction weights. Once research is complete this mode will become the default option and the environment variable settings will be updated.

nakajee commented 2 months ago

Please resolve the conflict.

nakajee commented 2 months ago

The code change looks OK. Please fix static-analysis fails.

nakajee commented 2 months ago

Wait. Some fail in 90a precheckin... Not sure what is wrong...

nakajee commented 2 months ago

Please check with @AlexBrownAMD to see if the fail is related to this change or not.

AlexBrownAMD commented 2 months ago

Please check with @AlexBrownAMD to see if the fail is related to this change or not.

90a failers do not appear to be related, but I just fixed clang format errors, so we can let the 90a tests runs again

nakajee commented 2 months ago

90a test failed again (most likely not related to this change). I just added gfx942 label to make sure no fails on gfx942.