using QUDA in the HMC, it cannot be used for online measurements

kostrzewa commented 2 years ago

There seems to be a bug right now which messes up the QUDA parameters when used in both the HMC for the single parity solve as well as for the online measurement. Probably just some silly oversight.

kostrzewa commented 2 years ago

found the bugger, had to do with the gamma basis used for single parity solves, which was not reset for standard solves

kostrzewa commented 2 years ago

I'm afraid this is back. Not quite sure what's causing it now...

kostrzewa commented 2 years ago

@simone-romiti Would you be willing to investigate this?

simone-romiti commented 2 years ago

Please see https://github.com/etmc/tmLQCD/pull/519

kostrzewa commented 2 years ago

@simone-romiti Were you able to confirm the issue with the residue in the meantime?

simone-romiti commented 2 years ago

I wasn't able to reproduce that residue issue, but I have the feeling it came from multiple inconsistent definitions of c_sw in the input file.

kostrzewa commented 2 years ago

Can you explain what you did to attempt a reproduction? In my tests the residual in an online measurement during the HMC is wrong. Are you saying that you have an output file of an HMC run where the residue for the online measurement is such that the solve appears to have converged correctly?

kostrzewa commented 2 years ago

I have the feeling it came from multiple inconsistent definitions of c_sw in the input file.

We've already excluded that this is a culprit as I was never doing this.

Marcogarofalo commented 2 years ago

I think that this may be solved by #525. For a very small lattice

T=8
L=4
Measurements = 4
Startcondition = hot
InitialStoreCounter = 0

I compare a run on the host using cg and stored in onlinemeas.000003_ref with a run on the device using mg.

garofalo@qbig:/qbigwork/garofalo/tmLQCD/build$ sdiff onlinemeas.000003 onlinemeas.000003_ref 
1  1  0  5.566760e+01  0.000000e+00                             1  1  0  5.566760e+01  0.000000e+00
1  1  1  6.254904e+00  6.067758e+00                             1  1  1  6.254904e+00  6.067758e+00
1  1  2  1.112194e+00  8.851545e-01                             1  1  2  1.112194e+00  8.851545e-01
1  1  3  1.655930e-01  1.519360e-01                             1  1  3  1.655930e-01  1.519360e-01
1  1  4  5.293340e-02  0.000000e+00                             1  1  4  5.293340e-02  0.000000e+00
2  1  0  -6.644760e-01  0.000000e+00                            2  1  0  -6.644760e-01  0.000000e+00
2  1  1  2.595605e+00  -3.244143e+00                            2  1  1  2.595605e+00  -3.244143e+00
2  1  2  5.945404e-01  -4.227463e-01                            2  1  2  5.945404e-01  -4.227463e-01
2  1  3  7.036322e-02  -6.507322e-02                            2  1  3  7.036322e-02  -6.507322e-02
2  1  4  5.109316e-03  0.000000e+00                             2  1  4  5.109316e-03  0.000000e+00
6  1  0  -2.165718e+00  0.000000e+00                            6  1  0  -2.165718e+00  0.000000e+00
6  1  1  5.403535e-02  -1.337601e-01                            6  1  1  5.403535e-02  -1.337601e-01
6  1  2  -1.611790e-02  1.086402e-02                          | 6  1  2  -1.611791e-02  1.086402e-02
6  1  3  7.483443e-03  6.764100e-04                           | 6  1  3  7.483443e-03  6.764102e-04
6  1  4  -1.734317e-03  0.000000e+00                            6  1  4  -1.734317e-03  0.000000e+00

Also, the residue looks ok to me

# TM_QUDA: Updating MG Preconditioner Setup for gauge_id: 0.044000
# TM_QUDA: Time for MG_Preconditioner_Setup_Update 2.376449e-02 s level: 3 proc_id: 0 /HMC/correlators_measurement/invert_eo_quda/MG_Preconditioner_Setup_Update
# TM_QUDA: Time for reorder_spinor_toQuda 2.507400e-05 s level: 3 proc_id: 0 /HMC/correlators_measurement/invert_eo_quda/reorder_spinor_toQuda
Source: 768
Prepared source = 673.325
Prepared solution = 0
Prepared source post mass rescale = 673.325
Creating a GCR solver
GCR:     0 iterations, <r,r> = 6.733246e+02, |r|/|b| = 1.000000e+00
GCR:     1 iterations, <r,r> = 8.976795e-01, |r|/|b| = 3.651307e-02
GCR:     2 iterations, <r,r> = 3.241824e-03, |r|/|b| = 2.194232e-03
GCR:     3 iterations, <r,r> = 1.035675e-05, |r|/|b| = 1.240222e-04
GCR:     4 iterations, <r,r> = 3.436469e-08, |r|/|b| = 7.144041e-06
GCR (restart):     1 iterations, <r,r> = 3.438203e-08, |r|/|b| = 7.145843e-06
GCR:     5 iterations, <r,r> = 9.722416e-11, |r|/|b| = 3.799923e-07
GCR:     6 iterations, <r,r> = 2.979178e-13, |r|/|b| = 2.103468e-08
GCR:     7 iterations, <r,r> = 9.627888e-16, |r|/|b| = 1.195785e-09
GCR: number of restarts = 1
GCR: Convergence at 7 iterations, L2 relative residual: iterated = 1.195750e-09, true = 1.195750e-09 (requested = 3.853787e-09)
Solution = 1781.02
Reconstructed solution: 2251.46
# TM_QUDA: Time for invertQuda 1.998980e-02 s level: 3 proc_id: 0 /HMC/correlators_measurement/invert_eo_quda/invertQuda

kostrzewa commented 2 years ago

Also, the residue looks ok to me

Awesome. Do you still have the next line(s) of the output which should contain tmLQCD's residual check (rather than QUDA's residual, which always appeared to be correct).

Marcogarofalo commented 2 years ago

Maybe tmLQCD compute the squared residue

Reconstructed solution: 2251.46
# TM_QUDA: Time for invertQuda 1.998980e-02 s level: 3 proc_id: 0 /HMC/correlators_measurement/invert_eo_quda/invertQuda
# TM_QUDA: Done: 7 iter / 0.018373 secs = 78.5877 Gflops
# TM_QUDA: Time for reorder_spinor_fromQuda 2.966800e-05 s level: 3 proc_id: 0 /HMC/correlators_measurement/invert_eo_quda/reorder_spinor_fromQuda
# TM_QUDA: Time for invert_eo_quda 4.488409e-02 s level: 2 proc_id: 0 /HMC/correlators_measurement/invert_eo_quda
# Inversion done in 7 iterations, squared residue = 7.625053e-16!
# Inversion done in 4.74e-02 sec.
# : Time for correlators_measurement 4.907240e-02 s level: 1 proc_id: 0 /HMC/correlators_measurement

kostrzewa commented 2 years ago

Maybe tmLQCD compute the squared residue

yes, and it's always the residual by itself (not the relative one). This looks good thanks!

kostrzewa commented 2 years ago

I'm afraid this is still a problem for me (in the sense that neither MG nor CG converge in the online measurement as part of an nf=2+1+1 HMC)...

This is CG (which converges according to QUDA but seemingly to the wrong result according to the residual check):

 $ tail -f log_1645123984.out | grep residue 
# Inversion done in 14635 iterations, squared residue = 6.294779e+04!
# Inversion done in 10410 iterations, squared residue = 5.710302e+04!
# Inversion done in 10097 iterations, squared residue = 5.745964e+04!
# Inversion done in 8723 iterations, squared residue = 5.550602e+04!

kostrzewa commented 2 years ago

Alright, as discussed, here's a minimal reproducer. No MG, just CG. ~The problem appears when the NDCLOVERRAT monomial is added, so there must be some leftover parameter switch which we don't take into account.~ nope, this is not the reason

It's independent of the order in which the monomials are specified and also independent of use_even_odd for the CLOVER operator used in the online measurement.

T=16
L=4
Measurements = 50
Startcondition = hot
InitialStoreCounter = 0 
#Startcondition = continue
#InitialStoreCounter = readin
2KappaMu = 0.0015846837
CSW = 1.76
kappa = 0.15
NSave = 1
ThetaT = 1.0
UseEvenOdd = yes
ReversibilityCheck = no
ReversibilityCheckIntervall = 100
DebugLevel = 3
ompnumthreads = 6

BeginIntegrator 
  Type0 = 2MN
  Type1 = 2MN
  IntegrationSteps0 = 1
  IntegrationSteps1 = 2
  tau = 0.1
  Lambda0 = 0.19
  Lambda1 = 0.20
  NumberOfTimescales = 2
  MonitorForces = no
EndIntegrator

BeginMonomial GAUGE
  Type = Wilson
  beta = 5.60
  Timescale = 0
  UseExternalLibrary = quda
EndMonomial

BeginOperator CLOVER
  CSW = 1.76
  kappa = 0.15
  2kappamu = 0.0015846837
  SolverPrecision = 1e-14
  MaxSolverIterations = 10000
  solver = cg
  UseEvenOdd = yes
  useexternalinverter = quda
  usesloppyprecision = single
EndOperator

BeginMeasurement CORRELATORS
  Frequency = 1
EndMeasurement

BeginMonomial CLOVERDET
  Timescale = 1
  kappa = 0.15
  2KappaMu = 0.0015846837
  CSW = 1.76
  rho = 0.09353509
  MaxSolverIterations = 10000
  AcceptancePrecision =  1.e-19
  ForcePrecision = 1.e-15
  Name = cloverdetlight
  solver = cg
  useexternalinverter = quda
  usesloppyprecision = half
EndMonomial

kostrzewa commented 2 years ago

Evolving an HMC for 49 trajectories on a 4c16 lattice works nicely using the following integrator:

BeginIntegrator 
  Type0 = 2MN 
  Type1 = 2MN 
  IntegrationSteps0 = 1 
  IntegrationSteps1 = 2 
  tau = 0.1 
  Lambda0 = 0.19
  Lambda1 = 0.20
  NumberOfTimescales = 2 
  MonitorForces = no
EndIntegrator

Running this trajectory, once using tmLQCD to solve for the online measurement and once using QUDA, I get the following correlators at trajectory 49 (tmLQCD left, QUDA right):

1  1  0  4.905795e+01  0.000000e+00       | 1  1  0  5.030253e+01  0.000000e+00
1  1  1  8.515074e+00  8.015892e+00       | 1  1  1  1.330971e+01  8.459627e+00
1  1  2  1.783359e+00  1.638909e+00       | 1  1  2  5.356934e+00  1.913101e+00
1  1  3  7.024739e-01  3.899849e-01       | 1  1  3  2.644079e+00  5.004493e-01
1  1  4  1.835661e-01  1.451663e-01       | 1  1  4  9.142225e-01  1.763050e-01
1  1  5  3.883558e-02  4.926942e-02       | 1  1  5  3.148795e-01  7.100453e-02
1  1  6  1.413912e-02  1.563275e-02       | 1  1  6  1.260886e-01  2.194228e-02
1  1  7  7.428324e-03  5.555699e-03       | 1  1  7  5.051833e-02  1.223092e-02
1  1  8  4.436583e-03  0.000000e+00       | 1  1  8  2.127422e-02  0.000000e+00
2  1  0  6.288558e-01  0.000000e+00       | 2  1  0  -4.111925e+00  0.000000e+00
2  1  1  9.463874e-01  -1.931518e+00          | 2  1  1  -5.112575e+00  2.777394e+00
2  1  2  5.985951e-01  -3.280192e-01          | 2  1  2  -2.059360e+00  4.171696e-01
2  1  3  1.917542e-01  -1.381770e-01          | 2  1  3  -5.619905e-01  1.655978e-01
2  1  4  5.317374e-02  -4.361940e-02          | 2  1  4  -3.101735e-01  5.678142e-02
2  1  5  1.265858e-02  -1.612433e-02          | 2  1  5  -9.021375e-02  2.120072e-02
2  1  6  3.855478e-03  -3.966485e-03          | 2  1  6  -1.895364e-02  7.009182e-03
2  1  7  2.302083e-03  -7.334878e-04          | 2  1  7  -1.174924e-02  4.104665e-04
2  1  8  2.625197e-04  0.000000e+00       | 2  1  8  -6.862359e-03  0.000000e+00
6  1  0  -2.080063e+00  0.000000e+00          | 6  1  0  -4.903313e+00  0.000000e+00
6  1  1  3.507670e-01  -1.665105e-01          | 6  1  1  -3.677945e-01  -2.357396e-01
6  1  2  -1.166227e-02  3.013275e-03          | 6  1  2  1.177670e-01  -6.271045e-02
6  1  3  1.286590e-02  -4.614345e-03          | 6  1  3  5.833267e-03  -4.283882e-02
6  1  4  2.265724e-03  -9.804648e-06          | 6  1  4  4.505630e-02  5.888295e-03
6  1  5  1.200649e-03  2.680927e-03       | 6  1  5  3.778743e-03  2.597529e-03
6  1  6  -3.611065e-05  1.501504e-04          | 6  1  6  9.812108e-03  -5.020652e-04
6  1  7  6.634757e-04  1.901951e-04       | 6  1  7  -1.538161e-04  -5.964235e-04
6  1  8  1.173816e-04  0.000000e+00       | 6  1  8  1.044908e-03  0.000000e+00

While the trajectories were reproduced exactly (note that all derivatives were still computed via QUDA in both cases).

kostrzewa commented 2 years ago

Okay, I've found the bugger, now for real.

The issue was the following: the solver interface(s) for the monomials set inv_param.dagger = QUDA_DAG_YES when certain solvers are used (CG, for example). This also explains why your example, @Marcogarofalo, worked, while my example above (https://github.com/etmc/tmLQCD/issues/495#issuecomment-1048663230) does not: when the MG is used in the monomial, inv_param.dagger = QUDA_DAG_NO is set and this corresponds to what is required also for the operator solve for the online measurement.

See #528

kostrzewa commented 2 years ago

Resolved via #528

etmc / tmLQCD

using QUDA in the HMC, it cannot be used for online measurements #495