LatticeQCD / SIMULATeQCD

SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it easy for physicists to implement lattice QCD formulas while still providing competitive performance.
https://latticeqcd.github.io/SIMULATeQCD/
MIT License
29 stars 11 forks source link

Create a benchmark suite #3

Open luhuhis opened 2 years ago

luhuhis commented 2 years ago

So that we can easily do reproducible benchmarks on different systems. Include memory bandwidth and flop measurements/counts. Use different algorithms (e.g. RHMC, gradientFlow) with different arithmetic intensities.

We also need to check if there are already timings and flop counts for the main routines, that is: inverter, dslash, force... If not, then implement them. We can create a new extern static object that measures the timings and provides common formatting for the output. Something similar to the stdLogger. Something like

FlopCounter.setFlops("Dslash", 1234 * size_h ); FlopCounter.start("Dslash"); .... FlopCounter.stop("Dslash");

And at the end of the main: FlopCounter.printResults();

Output could look like this: Needed 123 seconds for 456 ExaFLOP in dslash. (3.7 Exaflop per second)

We already have the classes MicroTimer and CudaStopWatch. The former only measures CPU time and can give incorrect results for GPUs. Remove them (and replace the current usages) with the new object!

lukas-mazur commented 2 years ago

MicroTimer and CudaStopWatch have been replaced by StopWatch in commit c00eec4bb496f9c52748b3e957c447e17a79cdfe