Open alazzaro opened 3 weeks ago
The performance gain with the tuned A100 kernels is minor compared to using the P100 kernels like the tuned P100 kernels work reasonably well for V100.
It is better to use the full set of autotuned and predicted kernels from the previous GPU generation than to use only a relative small set of autotuned kernels.
From the comments above (see https://github.com/cp2k/dbcsr/pull/804#issuecomment-2167716134) looks like
Then, the strategy will be to rename the file/parameters in "AMD" and "NVIDIA" and drop the specific GPU version. As I said, I will add a generic kernel which will be good enough for all cases we don't cover with autotuning.
is a good compromise to move forward. But I'm no expert on this, so it's good to hear what people think about this issue.
Then, the strategy will be to rename the file/parameters in "AMD" and "NVIDIA" and drop the specific GPU version.
Good idea. In particular since a specific tuning may also need maintenance given the underlying runtime version can change over time (aka new CUDA version). Also, this opens a reasonable option to tune/refresh for the latest/deployed GPU (and to naturally phase-out some tuning for older GPUs, not saying it would not run anymore).
Follow up of https://github.com/cp2k/dbcsr/pull/804