ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
218 stars 147 forks source link

Atomic 2-tile strean-k and tuning parameter clean-up #1906

Closed AlexBrownAMD closed 6 months ago

AlexBrownAMD commented 6 months ago

Add support for atomic version of 2-tile stream-k algorithm. This change also cleans up the stream-k tuning parameters. Atomic mode is now a separate parameter. StreamK=0 is off, regular DP kernel StreamK=1 is basic stream-k StreamK=2 is two-tile StreamKAtomic toggles use of atomics for fixup step (and requirements for flags, workspace, and init kernel) Atomic mode currently only works for SGEMM. Updated the test suite to use new parameters and include atomic 2-tile tests.