QUDA inverter in the HMC

Marcogarofalo commented 2 years ago

tentative to use quda inverter in hmc.

kostrzewa commented 2 years ago

@Marcogarofalo @sunpho84 this now has the first working call from solve_degenerate to the QUDA solvers. I think based on this it should now be mostly a matter of setting parameters properly to also get CLOVERDET running without mass preconditioning (there's an issue with reloading the clover field wihch I never resolved...).

For the determinant ratios 1) The correct usage of the mu2 and kappa2 parameters for DETRATIO must be ensured 2) Work in QUDA is required to support CLOVERDETRATIO due to its usage of the rho parameter. This is also required to use mass preconditioning in CLOVERDET.

For the RHMC, an interface to QUDA's multi-shift solver is required and the rescaling that is also done in the QPhiX interface.

Because the reordering routines will have some overhead, support should be added to QUDA to perform the gamma basis change on the device as well as the ability to load and store tmLQCD fields directly. See https://github.com/qcdcode/quda/issues/13

kostrzewa commented 2 years ago

To use the MG in the HMC, a lot more work is required as one needs to do setup evolution. There's a test in QUDA's test directories which demonstrates how this is done.

It also requires quite a bit of work to figure out the correct parameters to solve the eo-precon problem using direct solves. I think on that point we'll definitely have to talk to Kate.

kostrzewa commented 2 years ago

Just as a note: the MG solver "works" in this way, but I'm almost certain that it's not correct to use it like this (in terms of getting a good algorithm). I think the MG-preconditioner actually isn't doing anything to precondition the fine system. The number of outer iterations is currently at the level of a few hundred on a 24c32 lattice with 2kappamu = 0.1, which is simply a very poor GCR ;)

Marcogarofalo commented 2 years ago

@Marcogarofalo @sunpho84 this now has the first working call from solve_degenerate to the QUDA solvers. I think based on this it should now be mostly a matter of setting parameters properly to also get CLOVERDET running without mass preconditioning (there's an issue with reloading the clover field wihch I never resolved...).

I think that the CLOVERDET monomial works fine in quda. I generate two configurations with sample-hmc-tmcloverdet.input with the modification:

BeginMonomial CLOVERDET
  Timescale = 1
  2KappaMu = 0.01
  rho = 0.0
  CSW = 1.00
  kappa = 0.138
  AcceptancePrecision =  1.e-20
  ForcePrecision = 1.e-12
  Name = cloverdet
  useexternalinverter = quda
  usesloppyprecision = single
#  usecompression = 12
#  solver = CG
EndMonomial

the gauge configurations are the same up to 1e-6 and the only difference in the online measuraments is

diff onlinemeas.000002 ../run_reference/onlinemeas.000002_tmcloverdet
8c8
< 6  1  1  9.685959e-02  -3.568352e-02
---
> 6  1  1  9.685960e-02  -3.568352e-02

For the RHMC, an interface to QUDA's multi-shift solver is required and the rescaling that is also done in the QPhiX interface.

Because the reordering routines will have some overhead, support should be added to QUDA to perform the gamma basis change on the device as well as the ability to load and store tmLQCD fields directly. See qcdcode/quda#13

adding the tmLQCD gamma matrices in quda is not an option?

kostrzewa commented 2 years ago

I think that the CLOVERDET monomial works fine in quda. I generate two configurations with sample-hmc-tmcloverdet.input with the modification:

Yes, it works. The problem is that for the next trajectory, reloading the clover field hits an issue with the parameter struct which I know about but never resolved because we usually don't do multiple configs per run in analysis.

kostrzewa commented 2 years ago

adding the tmLQCD gamma matrices in quda is not an option?

many parts in QUDA are hard-coded for algorithmic reasons: the DeGrand-Rossi basis is used in the MG (it's very similar to ours) because it's chiral (and this allows the chiralities to be split on the coarse grid) and the UKQCD (non-relativistic) basis is used elsewhere

the trick is to add the correct reorderings to QUDA (all the machinery is there). On the GPU, the reordering before a solve, for example, is a tiny overhead compared to actually sending the field to the device.

kostrzewa commented 2 years ago

please note https://github.com/etmc/tmLQCD/issues/494#issuecomment-895895089

urbach commented 2 years ago

Is there anything speaking against merging this in and work directly on tmLQCD:quda_work_add_actions?

kostrzewa commented 2 years ago

Is there anything speaking against merging this in and work directly on tmLQCD:quda_work_add_actions?

I wanted to give everyone a chance to follow what's happening by looking at the updates in this PR. I would say at this point we should merge it in and continue in a new PR (not in Marco's fork but directly in etmc/tmLQCD)

etmc / tmLQCD

QUDA inverter in the HMC #491