Closed zhanglx13 closed 6 months ago
GEMM Tuning Script v3.1
With the help of https://github.com/ROCmSoftwarePlatform/triton/pull/466, using mfma16 can have better performance than using mfma32. Therefore, this PR adds matrix_instr_nonkdim into the tuning space.
Also fixed an issue with wrong type for the warmup kernel.
GEMM Tuning Script v3.1
With the help of https://github.com/ROCmSoftwarePlatform/triton/pull/466, using mfma16 can have better performance than using mfma32. Therefore, this PR adds matrix_instr_nonkdim into the tuning space.
Also fixed an issue with wrong type for the warmup kernel.