cp2k / dbcsr

DBCSR: Distributed Block Compressed Sparse Row matrix library
https://cp2k.github.io/dbcsr/
GNU General Public License v2.0
134 stars 45 forks source link

Discussion on tuning machinery #805

Open alazzaro opened 3 weeks ago

alazzaro commented 3 weeks ago

Follow up of https://github.com/cp2k/dbcsr/pull/804

RMeli commented 3 weeks ago
  • The performance gain with the tuned A100 kernels is minor compared to using the P100 kernels like the tuned P100 kernels work reasonably well for V100.

  • It is better to use the full set of autotuned and predicted kernels from the previous GPU generation than to use only a relative small set of autotuned kernels.

From the comments above (see https://github.com/cp2k/dbcsr/pull/804#issuecomment-2167716134) looks like

Then, the strategy will be to rename the file/parameters in "AMD" and "NVIDIA" and drop the specific GPU version. As I said, I will add a generic kernel which will be good enough for all cases we don't cover with autotuning.

is a good compromise to move forward. But I'm no expert on this, so it's good to hear what people think about this issue.

hfp commented 3 weeks ago

Then, the strategy will be to rename the file/parameters in "AMD" and "NVIDIA" and drop the specific GPU version.

Good idea. In particular since a specific tuning may also need maintenance given the underlying runtime version can change over time (aka new CUDA version). Also, this opens a reasonable option to tune/refresh for the latest/deployed GPU (and to naturally phase-out some tuning for older GPUs, not saying it would not run anymore).