Closed kostrzewa closed 2 years ago
Note that I've also reintroduced TM_QUDA_EXPERIMENTAL
which allows this branch to be built against the 1.1.x branch of QUDA (or develop) when this is disabled (--disable-quda_experimental
) but of course then does not support 1+1 twisted-clover or twisted-clover mass preconditioning.
To compile it against the feature/ndeg-twisted-clover QUDA branch and with all features, configure
should be called with --enable-quda_experimental
.
QUDA issue asking for help: https://github.com/lattice/quda/issues/1170
This works now. MGRefreshSetupMaxSolverIterations
and MGRefreshSetupMDUThreshold
are now yet other parameters to be tuned in runs. Note that also MGResetSetupMDUThreshold
needs to bet set appropriately (to the trajectory length, for example) to prevent unwanted resets.
Fixes #494
Behaviour along a trajectory (keep in mind this is 2-level MG and still faster than the corresponding CG, at least on my test machine):
GCR: Convergence at 114 iterations, L2 relative residual: iterated = 2.623039e-14, true = 2.623039e-14 (requested = 3.275482e-14)
GCR: Convergence at 62 iterations, L2 relative residual: iterated = 3.251018e-14, true = 3.251018e-14 (requested = 3.270564e-14)
GCR: Convergence at 145 iterations, L2 relative residual: iterated = 3.080900e-13, true = 3.080900e-13 (requested = 3.270564e-13)
GCR: Convergence at 178 iterations, L2 relative residual: iterated = 3.387833e-13, true = 3.387833e-13 (requested = 3.560340e-13)
GCR: Convergence at 58 iterations, L2 relative residual: iterated = 2.449851e-13, true = 2.449851e-13 (requested = 3.275482e-13)
GCR: Convergence at 61 iterations, L2 relative residual: iterated = 2.722033e-13, true = 2.722033e-13 (requested = 3.565397e-13)
GCR: Convergence at 57 iterations, L2 relative residual: iterated = 2.777510e-13, true = 2.777510e-13 (requested = 3.275680e-13)
GCR: Convergence at 61 iterations, L2 relative residual: iterated = 3.175343e-13, true = 3.175343e-13 (requested = 3.565370e-13)
GCR: Convergence at 57 iterations, L2 relative residual: iterated = 2.938131e-13, true = 2.938131e-13 (requested = 3.275665e-13)
GCR: Convergence at 61 iterations, L2 relative residual: iterated = 2.998862e-13, true = 2.998862e-13 (requested = 3.565381e-13)
GCR: Convergence at 58 iterations, L2 relative residual: iterated = 2.756795e-13, true = 2.756795e-13 (requested = 3.275843e-13)
GCR: Convergence at 63 iterations, L2 relative residual: iterated = 2.546070e-13, true = 2.546070e-13 (requested = 3.565364e-13)
GCR: Convergence at 148 iterations, L2 relative residual: iterated = 3.054770e-13, true = 3.054770e-13 (requested = 3.270728e-13)
GCR: Convergence at 177 iterations, L2 relative residual: iterated = 3.075183e-13, true = 3.075183e-13 (requested = 3.560334e-13)
GCR: Convergence at 148 iterations, L2 relative residual: iterated = 3.156105e-13, true = 3.156105e-13 (requested = 3.270727e-13)
GCR: Convergence at 179 iterations, L2 relative residual: iterated = 3.130081e-13, true = 3.130081e-13 (requested = 3.560336e-13)
GCR: Convergence at 63 iterations, L2 relative residual: iterated = 2.702935e-13, true = 2.702935e-13 (requested = 3.276044e-13)
GCR: Convergence at 67 iterations, L2 relative residual: iterated = 3.237049e-13, true = 3.237049e-13 (requested = 3.565343e-13)
GCR: Convergence at 63 iterations, L2 relative residual: iterated = 2.763467e-13, true = 2.763467e-13 (requested = 3.276029e-13)
GCR: Convergence at 67 iterations, L2 relative residual: iterated = 3.152173e-13, true = 3.152173e-13 (requested = 3.565354e-13)
GCR: Convergence at 58 iterations, L2 relative residual: iterated = 2.380761e-13, true = 2.380761e-13 (requested = 3.276222e-13)
GCR: Convergence at 62 iterations, L2 relative residual: iterated = 2.374933e-13, true = 2.374933e-13 (requested = 3.565350e-13)
GCR: Convergence at 149 iterations, L2 relative residual: iterated = 3.181837e-13, true = 3.181837e-13 (requested = 3.270916e-13)
GCR: Convergence at 178 iterations, L2 relative residual: iterated = 3.553751e-13, true = 3.553751e-13 (requested = 3.560336e-13)
GCR: Convergence at 59 iterations, L2 relative residual: iterated = 2.543924e-13, true = 2.543924e-13 (requested = 3.276432e-13)
GCR: Convergence at 62 iterations, L2 relative residual: iterated = 3.405832e-13, true = 3.405832e-13 (requested = 3.565331e-13)
GCR: Convergence at 59 iterations, L2 relative residual: iterated = 2.581603e-13, true = 2.581603e-13 (requested = 3.276417e-13)
GCR: Convergence at 62 iterations, L2 relative residual: iterated = 3.373860e-13, true = 3.373860e-13 (requested = 3.565342e-13)
GCR: Convergence at 60 iterations, L2 relative residual: iterated = 3.153152e-13, true = 3.153152e-13 (requested = 3.276593e-13)
GCR: Convergence at 63 iterations, L2 relative residual: iterated = 3.480639e-13, true = 3.480639e-13 (requested = 3.565331e-13)
GCR: Convergence at 147 iterations, L2 relative residual: iterated = 3.124625e-13, true = 3.124625e-13 (requested = 3.271104e-13)
GCR: Convergence at 181 iterations, L2 relative residual: iterated = 3.450262e-13, true = 3.450262e-13 (requested = 3.560343e-13)
GCR: Convergence at 147 iterations, L2 relative residual: iterated = 3.080868e-13, true = 3.080868e-13 (requested = 3.271103e-13)
GCR: Convergence at 179 iterations, L2 relative residual: iterated = 3.505172e-13, true = 3.505172e-13 (requested = 3.560345e-13)
GCR: Convergence at 65 iterations, L2 relative residual: iterated = 2.560672e-13, true = 2.560672e-13 (requested = 3.276746e-13)
GCR: Convergence at 68 iterations, L2 relative residual: iterated = 3.287401e-13, true = 3.287401e-13 (requested = 3.565310e-13)
GCR: Convergence at 65 iterations, L2 relative residual: iterated = 2.558926e-13, true = 2.558926e-13 (requested = 3.276730e-13)
GCR: Convergence at 68 iterations, L2 relative residual: iterated = 3.251902e-13, true = 3.251902e-13 (requested = 3.565322e-13)
GCR: Convergence at 60 iterations, L2 relative residual: iterated = 2.459156e-13, true = 2.459156e-13 (requested = 3.276810e-13)
GCR: Convergence at 63 iterations, L2 relative residual: iterated = 3.308923e-13, true = 3.308923e-13 (requested = 3.565324e-13)
Might be useful to start updating the documentation at this stage...
This is working well also with a 3-level MG in a simulation of D15.48 on 6 nodes of Juwels Booster (using the 1.1.x branch of QUDA, of course). A trajectory of unit length takes 2200 seconds (without any tuning of the MG or the integrator setup). Out of these 2200 seconds, about 87.5 seconds go into setup evolution and MG parameter updates:
The time spent on updates can be reduced significantly by implementing true parameter updates in updateMultigridQuda
's thin_update_only
mode.
I had forgotten to enable GDR. With it enabled (and ensuring that the job is in one bcell for all cases), trajectory timings and scaling are better:
nds | traj. time (s) | GCR | notes |
---|---|---|---|
3 | 3173 | QpQm solve done: 63 iter / 3.27874 secs = 15742.6 Gflops |
|
6 | 1829 | QpQm solve done: 61 iter / 2.16022 secs = 23188.8 Gflops |
|
12 | 1225 | QpQm solve done: 77 iter / 2.439 secs = 32756.6 Gflops |
slightly different (worse) MG setup! |
The gauge derivative contribution (divided by two because I did two trajectories) is still significant also on Juwels Booster for this lattice size:
although on 12 nodes it contributes only about 10% of the total.
Getting deriv_Sb
running on the device will also be necessary:
ping @urbach
before any more work is done, this branch should be merged into quda_work_hmc
and that should in turn be merged into quda_work_add_actions
4 because we have four dimensions
okay, makes sense!
refreshes the setup correctly but then leads to the solve failing with a QUDA error coming from GCR
Will open an issue with the QUDA devs to see what we're doing incorrectly.