Add support for atomic version of 2-tile stream-k algorithm.
This change also cleans up the stream-k tuning parameters. Atomic mode is now a separate parameter.
StreamK=0 is off, regular DP kernel
StreamK=1 is basic stream-k
StreamK=2 is two-tile
StreamKAtomic toggles use of atomics for fixup step (and requirements for flags, workspace, and init kernel)
Atomic mode currently only works for SGEMM. Updated the test suite to use new parameters and include atomic 2-tile tests.
Add support for atomic version of 2-tile stream-k algorithm. This change also cleans up the stream-k tuning parameters. Atomic mode is now a separate parameter. StreamK=0 is off, regular DP kernel StreamK=1 is basic stream-k StreamK=2 is two-tile StreamKAtomic toggles use of atomics for fixup step (and requirements for flags, workspace, and init kernel) Atomic mode currently only works for SGEMM. Updated the test suite to use new parameters and include atomic 2-tile tests.