Closed clarkedavida closed 1 year ago
"The autotuner should be used by every operator call, and only once in the beginning of the program."
We could use the following strategy: One has a test run that tries different block sizes and sees which is most optimal. It saves those in some output file, say autotuner.d
. On a production run, the code reads in autotuner.d
and uses those block sizes. If there is no autotuner.d
, it uses default values.
Right now, we do not really care about the block-size. But especially when it comes to multiple RHS it will matter. We should implement an autotuner like quda has. One has to think of a good design. The autotuner should be used by every operator call, and only once in the beginning of the program. And then it should write down the tuned variables, and read it in again if possible.