Implement an autotuner - Githubissues

LatticeQCD / SIMULATeQCD

SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it easy for physicists to implement lattice QCD formulas while still providing competitive performance.

https://latticeqcd.github.io/SIMULATeQCD/

MIT License

30 stars 12 forks source link

Implement an autotuner #25

Closed clarkedavida closed 1 year ago

clarkedavida commented 2 years ago

Right now, we do not really care about the block-size. But especially when it comes to multiple RHS it will matter. We should implement an autotuner like quda has. One has to think of a good design. The autotuner should be used by every operator call, and only once in the beginning of the program. And then it should write down the tuned variables, and read it in again if possible.

clarkedavida commented 1 year ago

"The autotuner should be used by every operator call, and only once in the beginning of the program."

We could use the following strategy: One has a test run that tries different block sizes and sees which is most optimal. It saves those in some output file, say autotuner.d. On a production run, the code reads in autotuner.d and uses those block sizes. If there is no autotuner.d, it uses default values.