ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
212 stars 145 forks source link

Stream-k debug settings #1915

Closed AlexBrownAMD closed 4 months ago

AlexBrownAMD commented 4 months ago

Add new settings for debugging / profiling stream-k kernels. DebugStreamKNoFixup generates a kernel with no fixup step. DebugStreamKNoPartials generates a kernel that never writes partial results to workspace. Setting TENSILE_DB2=0x2 skips running the stream-k init kernel.