etmc / tmLQCD

tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.
http://www.itkp.uni-bonn.de/~urbach/software.html
GNU General Public License v3.0
32 stars 47 forks source link

change mechanism by which QUDA MG state's dependence on state of gauge (and clover) field is tracked #529

Closed kostrzewa closed 2 years ago

kostrzewa commented 2 years ago

This fixes one more bug due to which yet another instance of a dangling pointer in the MG Setup is resolved. This re-appared due to fixing another part of the logic through 221bf0980da0128df8d9f2702f65f7ab1cc19c69).

As a nice side effect, it reduces the number of Setup updates that should occur during the HMC (the setup will only be updated when necessary).

kostrzewa commented 2 years ago

@marcuspetschlies could you please test if this keeps the setup reuse as with 221bf0980da0128df8d9f2702f65f7ab1cc19c69 ?(the setup should only be updated when necessary)

Marcogarofalo commented 2 years ago

Hi, I observe that with this commit it is necessary to add

BeginOperator CLOVER
  ...
  usesloppyprecision = single
EndOperator

when useexternalinverter = quda is set. while before it was working without specifying the precision

kostrzewa commented 2 years ago

Hi, I observe that with this commit it is necessary to add

can you elaborate on what you mean with "necessary" ? The MG should also work fully in double precision (albeit inefficiently).

Marcogarofalo commented 2 years ago

Nothing sorry, I was convinced that by default usesloppyprecision = single, instead, I guess it is double=8. With double precision, it produces the error

# TM_QUDA: Time for loadCloverQuda 4.007330e-04 s level: 3 proc_id: 0 /HMC/correlators_measurement/invert_eo_quda/loadCloverQuda
# TM_QUDA: mu = 0.001200000000, kappa = 0.140065000000, csw = 1.740000000000
# TM_QUDA: using MG solver to invert operator with 2kappamu = 0.000336156000
# TM_QUDA: MG level 0, extent of (xyzt) dim 0: 8
# TM_QUDA: MG aggregation size set to: 1
# TM_QUDA: MG level 0, extent of (xyzt) dim 1: 8
# TM_QUDA: MG aggregation size set to: 2
# TM_QUDA: MG level 0, extent of (xyzt) dim 2: 8
# TM_QUDA: MG aggregation size set to: 2
# TM_QUDA: MG level 0, extent of (xyzt) dim 3: 4
# TM_QUDA: MG aggregation size set to: 2
# TM_QUDA: MG setting coarse mu scaling factor on level 0 to 1.000000
# TM_QUDA: MG setting coarse mu scaling factor on level 1 to 1.000000
# TM_QUDA: Destroying MG Preconditioner Setup
# TM_QUDA: Performing MG Preconditioner Setup for gauge_id: 6.001000
# TM_QUDA: Generating MG Setup with mu = 0.001199994860 instead of 0.001200000000
MG level 0 (GPU): ERROR: Precisions 4 8 do not match (/qbigwork/garofalo/quda/lib/coarse_op.cu:165 in calculateY())
 (rank 0, host lnode15.cluster.hiskp, lattice_field.h:795 in Precision_())

I notice this because I forgot to add usesloppyprecision = single, I am not sure whether this should be addressed, maybe it is not relevant for physical applications

kostrzewa commented 2 years ago

I notice this because I forgot to add usesloppyprecision = single, I am not sure whether this should be addressed, maybe it is not relevant for physical applications

It's certainly a bug and may indicate that we are setting some of the precisions incorrectly. Thanks. I've created a new issue to track this --> #530

kostrzewa commented 2 years ago

@Finkenrath this will reduce the number of MG updates in the HMC compared to 221bf0980da0128df8d9f2702f65f7ab1cc19c69