tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.
# Time for detratio1 monomial derivative: 1.254169e+01 s
# Time for detratio1 monomial derivative: 1.218095e+01 s
# Time for detratio1 monomial derivative: 1.244291e+01 s
# Time for detratio1 monomial derivative: 1.248843e+01 s
# Time for detratio1 monomial derivative: 1.263386e+01 s
# Time for detratio1 monomial derivative: 1.274833e+01 s
# Time for detratio1 monomial derivative: 1.260397e+01 s
# Time for detratio1 monomial derivative: 1.196881e+01 s
# Time for detratio1 monomial derivative: 1.249595e+01 s
# Time for detratio1 monomial derivative: 1.263323e+01 s
# Time for detratio1 monomial derivative: 1.198081e+01 s
# Time for detratio1 monomial derivative: 1.745349e+01 s
# Time for detratio1 monomial derivative: 1.222579e+01 s
tmLQCD+QPhiX AVX512:
# Time for detratio1 monomial derivative: 5.496177e+00 s
# Time for detratio1 monomial derivative: 5.364823e+00 s
# Time for detratio1 monomial derivative: 5.299350e+00 s
# Time for detratio1 monomial derivative: 5.268174e+00 s
# Time for detratio1 monomial derivative: 5.719475e+00 s
# Time for detratio1 monomial derivative: 6.428401e+00 s
# Time for detratio1 monomial derivative: 5.312237e+00 s
# Time for detratio1 monomial derivative: 5.389898e+00 s
# Time for detratio1 monomial derivative: 5.870602e+00 s
# Time for detratio1 monomial derivative: 5.376881e+00 s
# Time for detratio1 monomial derivative: 5.290776e+00 s
# Time for detratio1 monomial derivative: 5.427299e+00 s
# Time for detratio1 monomial derivative: 5.271687e+00 s
So the factor of 3.5-4 on a single node becomes about a factor of 2 at scale. However, there is hope that the coming strong-scaling features in QPhiX will improve this.
In fact, for this particular lattice (A40.48, a simulation that Francesco is doing), QPhiX is faster than DDalphaAMG for the lightest monomial... Of course, this won't be true for the physical point
This is for degenerate twisted mass (no clover)
tmLQCD rgmixedcg, compiled with
-xCORE-AVX2
tmLQCD+QPhiX AVX512:
So the factor of 3.5-4 on a single node becomes about a factor of 2 at scale. However, there is hope that the coming strong-scaling features in QPhiX will improve this.