tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.
# Trying to read gauge field from file conf.0240 in double precision.
# Constructing LEMON reader for file conf.0240 ...
found header xlf-info, will now read the message
found header ildg-format, will now read the message
found header ildg-binary-data, will now read the message
# Time spent reading 309 Gb was 27.9 s.
# Reading speed: 11.1 Gb/s (43.3 Mb/s per MPI process).
found header scidac-checksum, will now read the message
# Scidac checksums for gaugefield conf.0240:
# Calculated : A = 0x6ce26943 B = 0x57720413.
# Read from LIME headers: A = 0x6ce26943 B = 0x57720413.
# Reading ildg-format record:
# Precision = 64 bits (double).
# Lattice size: LX = 128, LY = 128, LZ = 128, LT = 256.
# Input parameters:
# Precision = 64 bits (double).
# Lattice size: LX = 128, LY = 128, LZ = 128, LT = 256.
# Finished reading gauge field.
# Computed plaquette value: 0.583358774141.
The first inversion will again be done with the initial MG setup with which the tuner was started:
# TM_QUDA: mu = 0.000540000000, kappa = 0.137972174000, csw = 1.611200000000
# TM_QUDA: using MG solver to invert operator with 2kappamu = 0.000149009948
# TM_QUDA: MG level 0, extent of (xyzt) dim 0: 32
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG level 0, extent of (xyzt) dim 1: 32
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG level 0, extent of (xyzt) dim 2: 32
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG level 0, extent of (xyzt) dim 3: 64
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG setting coarse mu scaling factor on level 0 to 1.000000
# TM_QUDA: MG level 1, extent of (xyzt) dim 0: 8
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG level 1, extent of (xyzt) dim 1: 8
# TM_QUDA: MG aggregation size set to: 4
# TM_QUDA: MG level 1, extent of (xyzt) dim 2: 8
# TM_QUDA: MG aggregation size set to: 2
# TM_QUDA: MG level 1, extent of (xyzt) dim 3: 16
# TM_QUDA: MG aggregation size set to: 2
# TM_QUDA: MG setting coarse mu scaling factor on level 1 to 1.000000
# TM_QUDA: MG setting coarse mu scaling factor on level 2 to 30.000000
# TM_QUDA: Destroying MG Preconditioner Setup
# TM_QUDA: Performing MG Preconditioner Setup for gauge_id: 3.000000
# TM_QUDA: Generating MG Setup with mu = 0.000540000000 instead of 0.000540000000
# TM_QUDA: Time for MG_Preconditioner_Setup 3.509506e+02 s level: 4 proc_id: 0 /DERIV_MG_TUNE/cloverdetlight:cloverdet_derivative/solve_degenerate/invert_eo_degenerate_quda/MG_Preconditioner_Setup
# TM_QUDA: Time for reorder_spinor_eo_toQuda 5.094417e-02 s level: 4 proc_id: 0 /DERIV_MG_TUNE/cloverdetlight:cloverdet_derivative/solve_degenerate/invert_eo_degenerate_quda/reorder_spinor_eo_toQuda
GCR: Convergence at 350 iterations, L2 relative residual: iterated = 2.314415e-04, true = 2.314415e-04 (requested = 3.162278e-11)
# TM_QUDA: Time for invertQuda 1.227894e+03 s level: 4 proc_id: 0 /DERIV_MG_TUNE/cloverdetlight:cloverdet_derivative/solve_degenerate/invert_eo_degenerate_quda/invertQuda
The correct behaviour would be for the tuned setup to be already used for the very first inversion on the new config as the current behaviour can be extremely wasteful if the initial setup does not converge or is very slow.
When the MG autotuner has found a good setup on the first gauge configuration:
and the configuration is switched:
The first inversion will again be done with the initial MG setup with which the tuner was started:
and only then will the tuned setup be applied:
The correct behaviour would be for the tuned setup to be already used for the very first inversion on the new config as the current behaviour can be extremely wasteful if the initial setup does not converge or is very slow.