etmc / tmLQCD

tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.
http://www.itkp.uni-bonn.de/~urbach/software.html
GNU General Public License v3.0
32 stars 47 forks source link

Real-world HMC QPhiX AVX512 speed-up #391

Open kostrzewa opened 7 years ago

kostrzewa commented 7 years ago

This is for degenerate twisted mass (no clover)

tmLQCD rgmixedcg, compiled with -xCORE-AVX2

# Time for detratio1 monomial derivative: 1.254169e+01 s
# Time for detratio1 monomial derivative: 1.218095e+01 s
# Time for detratio1 monomial derivative: 1.244291e+01 s
# Time for detratio1 monomial derivative: 1.248843e+01 s
# Time for detratio1 monomial derivative: 1.263386e+01 s
# Time for detratio1 monomial derivative: 1.274833e+01 s
# Time for detratio1 monomial derivative: 1.260397e+01 s
# Time for detratio1 monomial derivative: 1.196881e+01 s
# Time for detratio1 monomial derivative: 1.249595e+01 s
# Time for detratio1 monomial derivative: 1.263323e+01 s
# Time for detratio1 monomial derivative: 1.198081e+01 s
# Time for detratio1 monomial derivative: 1.745349e+01 s
# Time for detratio1 monomial derivative: 1.222579e+01 s

tmLQCD+QPhiX AVX512:

# Time for detratio1 monomial derivative: 5.496177e+00 s
# Time for detratio1 monomial derivative: 5.364823e+00 s
# Time for detratio1 monomial derivative: 5.299350e+00 s
# Time for detratio1 monomial derivative: 5.268174e+00 s
# Time for detratio1 monomial derivative: 5.719475e+00 s
# Time for detratio1 monomial derivative: 6.428401e+00 s
# Time for detratio1 monomial derivative: 5.312237e+00 s
# Time for detratio1 monomial derivative: 5.389898e+00 s
# Time for detratio1 monomial derivative: 5.870602e+00 s
# Time for detratio1 monomial derivative: 5.376881e+00 s
# Time for detratio1 monomial derivative: 5.290776e+00 s
# Time for detratio1 monomial derivative: 5.427299e+00 s
# Time for detratio1 monomial derivative: 5.271687e+00 s

So the factor of 3.5-4 on a single node becomes about a factor of 2 at scale. However, there is hope that the coming strong-scaling features in QPhiX will improve this.

kostrzewa commented 7 years ago

In fact, for this particular lattice (A40.48, a simulation that Francesco is doing), QPhiX is faster than DDalphaAMG for the lightest monomial... Of course, this won't be true for the physical point