ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
218 stars 147 forks source link

New dynamic mode #1900

Closed AlexBrownAMD closed 7 months ago

AlexBrownAMD commented 7 months ago

New dynamic mode to run large problems at slightly reduced CU count if it improves work division and power. New mode can be enabled by setting environment variable TENSILE_STREAMK_DYNAMIC_GRID=2. Option is still being benchmarked and evaluated for best use, but initial tests indicate this option should improve stream-k kernel performance on gfx942.