Closed urbach closed 2 years ago
I think
https://github.com/lattice/quda/pull/1136
makes clear what happens on the QUDA side.
The C-interface function is computeGaugeForceQuda
(https://github.com/qcdcode/quda/blob/b681990fde2ea40de4e5e3637107c0c0becc1ee8/lib/interface_quda.cpp#L4142)
and one needs to study the expected format for the output momentum field which likely needs to be reordered in the same way that the gauge field is reordered (Z -> X, X -> Z) and which might already use QUDA_RECONSTRUCT_8
or QUDA_RECONSTRUCT_10
(the latter is for HISQ, I think), although it might also be stored in full QUDA_RECONSTRUCT_18
.
The input gauge field instead is easy to take care of and just needs a call to _loadGaugeQuda
from our quda_interface.c
which is a no-op if the gauge field on the device is current.
Finally, I would propose that a wrapper function is introduced for the gauge derivative calculation which then hands off to QUDA (or another external library). Of course, one could also hand off right in gauge_derivative
at the cost of losing generality.
As discussed yesterday, one could also rename UseExternalInverter
to UseExternalLibrary
in the process to get a more consistent parameter naming.
Then one could specify:
BeginMonomial GAUGE
Type = Iwasaki
Timescale = 0
UseExternalLibrary = quda
EndMonomial
in the input file to offload the derivative to QUDA (note that the UseExternalInverter
parameter is currently not parsed for GAUGEMONOMIAL
.
The same game can also be played for computing the actual gauge energy, although I expect this to be a very minor part of the total.
Work should happen as a PR on top of https://github.com/etmc/tmLQCD/pull/491. Alternatively we can merge the latter in and have work take place in a PR on top of https://github.com/etmc/tmLQCD/pull/490.
Also keep in mind our kanban board https://github.com/etmc/tmLQCD/projects/2
The kanban is really cool! But, is it enough to drag and drop stuff across columns to get things actually done?
Apart from this, I have been exploring the gauge_monomial.c
file, I would say that the switch to the quda gauge force calculation should replace almost the whole body of gauge_derivative
routine, with just some basic link reordering, and the final call to _trace_lambda_mul_add_assign
. What more?
I think @sbacchio should be included into this issue since he has done already some studies.
The kanban is really cool! But, is it enough to drag and drop stuff across columns to get things actually done?
no, of course not, but it helps keeping an overview over what's going on (and ideally, who is working on what...)
Apart from this, I have been exploring the
gauge_monomial.c
file, I would say that the switch to the quda gauge force calculation should replace almost the whole body ofgauge_derivative
routine, with just some basic link reordering, and the final call to_trace_lambda_mul_add_assign
. What more?
I agree. It might be that even the trace is not necessary (as it might already be done by QUDA).
I've dug a bit, it looks like the top level interface
https://github.com/lattice/quda/blob/080cb1a83b13572df321b9be1891a9ff126c4e2d/include/quda.h#L1256
does even the full update of the momenta.
If one calls the innermost routine,
one might avoid this. I don't thing that compression "10" is the kind of projection you aim at, though
I didn't realize that my reply to this had not appeared. The RECONSTRUCT_10
compression for momenta seems to be 9 numbers for the momentum with the last one ignored or used for staggered actions or anisotropy: three imaginary on the diagonal, three complex on the off-diagonal.
Unfortunately, this can't simply be projected to what we want with what exists in QUDA (using one of the copyGauge
instances) as RECONSTRUCT_8
instead implements appendix A.2 of https://arxiv.org/pdf/0911.3191.pdf
I'm not able to compile #490, lot's of undefined global variables. Maybe because I don't have tmlqcd_config.h
generated. But I don't understand why that is.
I'm not able to compile #490, lot's of undefined global variables. Maybe because I don't have
tmlqcd_config.h
generated. But I don't understand why that is.
1) are you working in a fresh build directory?
2) was the source code in a new directory (and did you run autoconf
to generate a new configure file?)
The transition from config.h
to tmlqcd_config.h
and the inclusion of the auto-generated tmlqcd_cofnig_internal.h
is unfortunately rather precarious if working in an existing build directory as I've changed which files are auto-generated and which are used from the source directory. This leads to a dependency mismatch if working in an existing directory.
In the build directory, include/tmlqcd_config_internal.h
should exist and no other file. If there are other files there, delete them.
I had done this transition already at some point.
The source code is not in a new directory, unfortunately. I'll loose all my local branches etc. if I do so.
using a fresh build directory fixed the compile, thanks
how do we loose generality by handing off in gauge_derivative
directly?
how do we loose generality by handing off in
gauge_derivative
directly?
In the sense that writing a general wrapper function which does the hand-off forces one to think about a clean interface and might allow this to also be used for the gradient flow in the end. Also, one might consider writing bits and pieces of device code in simpler libraries (for architectures beyond accelerators supported by QUDA) and to employ these via the same mechanism.
The other point is that we will also need to update the momentum field from the fermionic monomials, so thinking about how to keep the momentum field on the device and the CPU side in sync is a good exercise. If one thinks about this only for the gauge monomial, one might have to duplicate a lot of code in the end.
looking at the corresponding QUDA function I'm not sure we can reuse this for the gradient flow. To me this function appears to do exactly what we do in gauge_derivative
. But I'll have to check how the gradient flow is implemented in QUDA.
Keeping the momenta on the GPU and CPU in sync is important to think about. Currently, it seems to me that it is a separate issue, though.
looking at the corresponding QUDA function I'm not sure we can reuse this for the gradient flow. To me this function appears to do exactly what we do in gauge_derivative. But I'll have to check how the gradient flow is implemented in QUDA.
Sure, using the function discussed above is not suitable for the gradient flow but in principle once one has a clean interface for offloading the staple calculation, this can then be re-used identically in our gradient flow routines, such that we don't actually have to mess around with QUDA's gradient flow and instead still use our RK-integrator with just the kernels handed off to the device.
Keeping the momenta on the GPU and CPU in sync is important to think about. Currently, it seems to me that it is a separate issue, though.
I'm not sure. computeGaugeForceQuda
and the corresponding fermionic force routines combine what we call the derivative with what is done in update_momenta
. I think we can work around that, however, by passing dt=1.0
in computateGaugeForceQuda
. We will still have to project down to the 8-real representation after having done so.
can we use this issue to organise ourselves on this topic?
I'm still lacking a bit the overview, so how do we proceed best here?
Also involves @Marcogarofalo and @sunpho84