Closed kostrzewa closed 2 years ago
Do not merge yet. I'm afraid there are other issues left...
https://github.com/etmc/tmLQCD/pull/525/commits/6c040df7fb9c312d6adcabf672db9e031150bbd8 resolves #517 and solidifies the precision mismatch fix
The problem was that the MG Setup (in particular I guess the coarse operators) seem to have an internal memory of the gauge field device pointers (rather than an abstract reference which I would expect to update with the gauge field on the device).
When we call freeGaugeQuda()
in the HMC, we are left with dangling pointers in the MG and this is what causes the crazy "volume mismatches". At the same time, the current gauge and clover fields must be consistent with the precisions in the MG Setup and this leads to the precision mismatches.
I'm not happy with this because it induces lots of MG Setup updates, but these are not THAT expensive. I think this is ready to test now.
I'm not happy with this because it induces lots of MG Setup updates, but these are not THAT expensive. I think this is ready to test now.
but it works. Also all the valgrind messages disappear. Thanks
but it works. Also all the valgrind messages disappear. Thanks
Thanks for the test! It would be interesting to see how a profile with this code compares to the profiles that you generated a while ago.
Thanks for all the tests, valgrind runs and tentative workarounds @sunpho84 @simone-romiti @Marcogarofalo @pittlerf Without the hints from these I would not have been able to fix this...
We were not tracking the precisions of the gauge and clover fields present on the device and hence these were bound to lead to mismatches when switching from one monomial employing an operator and solver with a particular set of precisions to another monomial employing a different set of precisions (for
cuda_prec
,cuda_prec_sloppy
,cuda_prec_refinement_sloppy
,cuda_prec_precondition
andcuda_prec_eigensolver
and the corresponding ones forclover_quda_*
.Unfortunately, this does not fully "solve" #517, but it does raise a new type of error there which might help resolve that too in the end.
The set of changes here has a drawback of course: the gauge and clover fields are reloaded much more frequently instead of just causing the missing precision to be instantiated from the existing double precision field on the device. Not sure how bad the additional overhead is compared to the time spent in a trajectory.
It is especially a complete waste of time to call
reorder_gauge_toQuda
so frequently because this should really only be called wheng_gauge_field
or any of the theta angles have actually changed.freeGaugeQuda()
andloadGaugeQuda(..)
do have to be called, however (at least for now, since that is our mechanism for ensuring that the field is up to date).