etmc / tmLQCD

tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.
http://www.itkp.uni-bonn.de/~urbach/software.html
GNU General Public License v3.0
32 stars 47 forks source link

apparent divergences in QUDA-MG in the HMC #508

Closed kostrzewa closed 1 year ago

kostrzewa commented 2 years ago

It seems that when the MG setup is refreshed for a particular monomial, the subsequent solve for another monomial (with tm_rho=0) fails to converge.

I wonder if running the refresh

https://github.com/etmc/tmLQCD/blob/77f264dcb334d696c848f7b5002fadf32f477287/quda_interface.c#L1500

for the 'cloverdetratio2light' parameters (which include a Hasenbusch mass shift tm_rho of the preconditioned fine system) messes with the null vectors.

It should be noted that the outer solver parameters definitely contain tm_rho:

https://github.com/etmc/tmLQCD/blob/77f264dcb334d696c848f7b5002fadf32f477287/quda_interface.c#L1780

If this would be the culprit, then one could modify the refresh procedure:

1) categorically set tm_rho to zero in the outer solver parameters (quda_mg_param.invert_param->tm_rho = 0) before the refresh

2) adopt quda_mg_param.invert_param->mu = -quda_input.mg_setup_2kappamu/2.0/g_kappa (only relevant for non-clover twisted-mass) as in the initial setup generation

3) refresh setup

4) reset params and run TM_QUDA_MG_SETUP_UPDATE if necessary for the current solve

5) move on to solving

This might alleviate these fluctuations. It should also be noted that (1) might play a role in the initial generation of the setup as well if one of the monomials with tm_rho != 0 is the one to initiate the MG setup.

kostrzewa commented 2 years ago

Input file for a run where this can be observed: hmc_cA211.075.24_out_of_max_twist_start.input.txt

kostrzewa commented 2 years ago

Excerpt from a log file where this can be observed (look for "2000 iterations"): cA211.075.24_init_therm.749.log.excerpt.txt

kostrzewa commented 2 years ago

First try at fixing this: #509 (doesn't seem to help)