ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/
Other
341 stars 158 forks source link

aldebaran BBS tuning #1284

Closed benjaminulmer closed 1 year ago

benjaminulmer commented 1 year ago

Tuning for SWDEV-372453

All sizes are new and there are some new kernels as well.

benjaminulmer commented 1 year ago

I assume you are using the re-tuning script for this PR

It's a combination a retuning and regular tuning. All the kernels added are true new kernels

babakpst commented 1 year ago

This PR is ready to merge. The failed tests are unrelated to these changes.

nielenventer commented 1 year ago

This patch causes a regression for large GEMMs (SWDEV-375718), the reason seems to be that there are no tuning points for larger M values (this one stops at 16), and so some large GEMMs are picking the kernels for the M=16 case (v small tile size, perf regression). I'm doing some extra tuning for larger M and will update the PR.

nielenventer commented 1 year ago

I added some new exact sizes, with large M, following the pattern of the other tunings in this commit. I confirmed it fixes the regression.

Only the Retune tool was used, but unfortunately new kernels are added due to known issue with merge script.