Discussion on tuning machinery

alazzaro commented 3 weeks ago

Follow up of https://github.com/cp2k/dbcsr/pull/804

RMeli commented 3 weeks ago

The performance gain with the tuned A100 kernels is minor compared to using the P100 kernels like the tuned P100 kernels work reasonably well for V100.

It is better to use the full set of autotuned and predicted kernels from the previous GPU generation than to use only a relative small set of autotuned kernels.

From the comments above (see https://github.com/cp2k/dbcsr/pull/804#issuecomment-2167716134) looks like

Then, the strategy will be to rename the file/parameters in "AMD" and "NVIDIA" and drop the specific GPU version. As I said, I will add a generic kernel which will be good enough for all cases we don't cover with autotuning.

is a good compromise to move forward. But I'm no expert on this, so it's good to hear what people think about this issue.

hfp commented 3 weeks ago

Then, the strategy will be to rename the file/parameters in "AMD" and "NVIDIA" and drop the specific GPU version.

Good idea. In particular since a specific tuning may also need maintenance given the underlying runtime version can change over time (aka new CUDA version). Also, this opens a reasonable option to tune/refresh for the latest/deployed GPU (and to naturally phase-out some tuning for older GPUs, not saying it would not run anymore).

cp2k / dbcsr

Discussion on tuning machinery #805