Closed benjaminulmer closed 1 year ago
I assume you are using the re-tuning script for this PR
It's a combination a retuning and regular tuning. All the kernels added are true new kernels
This PR is ready to merge. The failed tests are unrelated to these changes.
This patch causes a regression for large GEMMs (SWDEV-375718), the reason seems to be that there are no tuning points for larger M values (this one stops at 16), and so some large GEMMs are picking the kernels for the M=16 case (v small tile size, perf regression). I'm doing some extra tuning for larger M and will update the PR.
I added some new exact sizes, with large M, following the pattern of the other tunings in this commit. I confirmed it fixes the regression.
Only the Retune tool was used, but unfortunately new kernels are added due to known issue with merge script.
Tuning for SWDEV-372453
All sizes are new and there are some new kernels as well.