Closed urbach closed 2 years ago
@kostrzewa commented on this pull request.
@@ -89,6 +89,13 @@ typedef enum ExternalInverter_s { QPHIX_INVERTER } ExternalInverter;
+/ enumeration type for the external inverter /
@urbach does it perhaps make sense to merge this and
ExternalInverter
?
thought about it, but then decided against it.
ExternalLibrary is going to put the whole GaugeUpdate on the GPU, I hope. In the other case only the inverter is on the GPU. So, in my opinion two very different things. However, I can easily change this, if there are good arguments.
@urbach should we split the work on this somehow? In principle I have some time available in the next two weeks here or there.
@urbach should we split the work on this somehow? In principle I have some time available in the next two weeks here or there.
sure that would be great. I need to understand first, what the QUDA routine actually does. Maybe @sbacchio can help there?
Hi yes I can also help. Do you want to have any discussion on how to distribute it? and what you have in mind for now?
Hi yes I can also help. Do you want to have any discussion on how to distribute it? and what you have in mind for now?
discussion would be good. Bartek has certainly more inside already, for the following steps are needed/unclear
computeGaugeForceQuda
and how
to call itcomputeGaugeForceQuda
updates the momenta
directly. How to join this with the tmLQCD momenta?gauge_derivative
first and
then generalise when it is working, or do we start directly with a
more general interface function?what about a short meeting today at 1pm or 2pm?
Both timings would be fine for me
Same here although I would slightly prefer 2pm
Same here although I would slightly prefer 2pm
then let's say 2pm.
Hi
sorry I'm late, where can I join?
On 03/09/2021 10:01, Carsten Urbach wrote:
Same here although I would slightly prefer 2pm
then let's say 2pm.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/etmc/tmLQCD/pull/502#issuecomment-912341660, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANAL5SCZCP65XXWLLU4P3TUAB6GRANCNFSM5DB4VOIA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Just doing some dummy work to see if I can easily get a working call to the QUDA gauge force set up. Refinements can be made later (such as doing the projection onto the adjoint rep already in QUDA).
Hi Bartek, just for not doing the work twice, we are working on it... We have almost everything figured out and at some point early next week we are going to push here the implementation. @pittlerf is working on it with me
Alright, good to know. I'll push my latest additions (which still result in a segfault unfortunately) in a separate branch and you can then force-push here if you'd like.
ok great!
Would be good to have some WIP commits to look at if possible.
ok we have clean up to do, later today..
no worries, I didn't mean to imply any pressure, I'm just really interested in what you came up with
@kostrzewa can we be added to etmc so we can push here directly? At the moment changes are under https://github.com/pittlerf/tmLQCD/tree/testing_gauge_force_ferenc
@pittlerf was already a member and I've just added you, @sbacchio
Thanks @pittlerf and @sbacchio. Needs some minor conflict resolution
We checked serially the current implementation and as far as we can tell it matches the gauge_derivative implementation. Now we have some issue running in parallel.. It dies in the initialization of QUDA.. still have to investigate further
We checked serially the current implementation and as far as we can tell it matches the gauge_derivative implementation. Now we have some issue running in parallel.. It dies in the initialization of QUDA.. still have to investigate further
It (e.g. tmLQCD's hmc_tm
or invert
or offline_measurement
) launches fine in parallel for me, but I have some dH
issues when the gauge monomial is on the GPU (which are likely to do with the fact that I forgot to pull in the changes to QUDA that you've made w.r.t. to the double
alignment). The lib is recompiling as we speak and I'll test again in a few minutes.
I'm afraid that also with the additional commits in feature/ndeg-twisted-clover
I see large dH
in a pure-gauge run using the QUDA version of the gauge force.
Thanks for checking, but yes it is still WIP.. so far we did unit tests with our executable.
We checked serially the current implementation and as far as we can tell it matches the gauge_derivative implementation. Now we have some issue running in parallel.. It dies in the initialization of QUDA.. still have to investigate further
Now this is solved, for we had problems running interactively, using script it works also parallel
I'm afraid that also with the additional commits in
feature/ndeg-twisted-clover
I see largedH
in a pure-gauge run using the QUDA version of the gauge force.
Hi Bartek, could we have your input script, we have implemented some checks and would like to try.
Sure, I just do something like:
L=16
T=32
nrxprocs = 1
nryprocs = 1
nrzprocs = 1
ompnumthreads = 3
Measurements = 10000
StartCondition = hot
NSave = 500000
ThetaT = 1.0
UseEvenOdd = yes
ReversibilityCheck = no
ReversibilityCheckIntervall = 100
DebugLevel = 2
BeginMonomial GAUGE
Type = Iwasaki
beta = 1.9
Timescale = 0
UseExternalLibrary = quda
EndMonomial
BeginIntegrator
Type0 = 2MN
IntegrationSteps0 = 100
Tau = 1.0
Lambda0 = 0.193
NumberOfTimescales = 1
EndIntegrator
for pure gauge.
Sure, I just do something like:
L=16 T=32 nrxprocs = 1 nryprocs = 1 nrzprocs = 1 ompnumthreads = 3 Measurements = 10000 StartCondition = hot NSave = 500000 ThetaT = 1.0 UseEvenOdd = yes ReversibilityCheck = no ReversibilityCheckIntervall = 100 DebugLevel = 2 BeginMonomial GAUGE Type = Iwasaki beta = 1.9 Timescale = 0 UseExternalLibrary = quda EndMonomial BeginIntegrator Type0 = 2MN IntegrationSteps0 = 100 Tau = 1.0 Lambda0 = 0.193 NumberOfTimescales = 1 EndIntegrator
for pure gauge.
Thank you, we also see the problem, now try to identify the issue
For a full nf=2+1+1 twisted clover run (based on cA211.53.24 but on a 16c32 lattice instead):
NrXProcs = 1
NrYProcs = 1
NrZProcs = 1
ompnumthreads = 6
L=16
T=32
Measurements = 1000
# StartCondition = hot
StartCondition = continue
InitialStoreCounter = readin
2KappaMu = 0.0014846837
2KappaMuBar = 0.0394421632
2KappaEpsBar = 0.0426076209
CSW = 1.74
kappa = 0.1400645
NSave = 500000
ThetaT = 1.0
UseEvenOdd = yes
ReversibilityCheck = no
ReversibilityCheckIntervall = 100
DebugLevel = 2
# StrictResidualCheck = yes
UseRelativePrecision = yes
BeginExternalInverter QUDA
Pipeline = 10
gcrNkrylov = 20
MGCoarseMuFactor = 1.0, 1.0, 20.0
MGNumberOfLevels = 3
MGNumberOfVectors = 24, 24, 24
MGSetupSolver = cg
MGSetup2KappaMu = 0.0014846837
MGVerbosity = silent, silent, silent
MGSetupSolverTolerance = 5e-7, 5e-7
MGSetupMaxSolverIterations = 1500, 1500
MGCoarseSolverType = gcr, gcr, cagcr
MgCoarseSolverTolerance = 0.1, 0.1, 0.1
MGCoarseMaxSolverIterations = 15, 15, 15
MGSmootherType = cagcr, cagcr, cagcr
MGSmootherTolerance = 0.2, 0.2, 0.2
MGSmootherPreIterations = 0, 0, 0
MGSmootherPostIterations = 4, 4, 4
MGBlockSizesX = 4,2
MGBlockSizesY = 4,2
MGBlockSizesZ = 4,2
MGBlockSizesT = 4,2
MGOverUnderRelaxationFactor = 0.90, 0.90, 0.90
MGResetSetupMDUThreshold = 1.0
MGRefreshSetupMDUThreshold = 0.06249
MGRefreshSetupMaxSolverIterations = 15, 15
EndExternalInverter
BeginMeasurement CORRELATORS
Frequency = 1
EndMeasurement
BeginMonomial GAUGE
Type = Iwasaki
beta = 1.726
Timescale = 0
UseExternalLibrary = no
EndMonomial
BeginMonomial CLOVERDET
Timescale = 1
kappa = 0.1400645
2KappaMu = 0.0014846837
CSW = 1.74
rho = 0.09353509
MaxSolverIterations = 1000
AcceptancePrecision = 1.e-21
ForcePrecision = 1.e-16
Name = cloverdetlight
solver = cg
useexternalinverter = quda
usesloppyprecision = half
EndMonomial
BeginMonomial CLOVERDETRATIO
Timescale = 2
kappa = 0.1400645
2KappaMu = 0.0014846837
rho = 0.01039279
rho2 = 0.09353509
CSW = 1.74
MaxSolverIterations = 500
AcceptancePrecision = 1.e-21
ForcePrecision = 1.e-18
Name = cloverdetratio1light
solver = mg
useexternalinverter = quda
usesloppyprecision = single
EndMonomial
BeginMonomial CLOVERDETRATIO
Timescale = 3
kappa = 0.1400645
2KappaMu = 0.0014846837
rho = 0.0
rho2 = 0.01039279
CSW = 1.74
MaxSolverIterations = 500
AcceptancePrecision = 1.e-21
ForcePrecision = 1.e-18
Name = cloverdetratio2light
solver = mg
useexternalinverter = quda
usesloppyprecision = single
EndMonomial
BeginMonomial NDCLOVERRAT
Timescale = 1
kappa = 0.1400645
CSW = 1.74
AcceptancePrecision = 1e-21
ForcePrecision = 1e-16
StildeMin = 0.0000376
StildeMax = 4.7
MaxSolverIterations = 500
Name = ndcloverrat_0_3
DegreeOfRational = 10
Cmin = 0
Cmax = 3
ComputeEVFreq = 0
2Kappamubar = 0.0394421632
2Kappaepsbar = 0.0426076209
AddTrLog = yes
useexternalinverter = quda
usesloppyprecision = single
solver = cgmmsnd
EndMonomial
BeginMonomial NDCLOVERRAT
Timescale = 2
kappa = 0.1400645
CSW = 1.74
MaxSolverIterations = 1000
AcceptancePrecision = 1e-21
ForcePrecision = 1e-16
# lambda_min = 8e-6 (min evals go as low as 1.5e-5), maximal evals are found as high as 0.85 and fluctuate strongly
StildeMin = 0.0000376
StildeMax = 4.7
Name = ndcloverrat_4_6
DegreeOfRational = 10
Cmin = 4
Cmax = 6
ComputeEVFreq = 0
2Kappamubar = 0.0394421632
2Kappaepsbar = 0.0426076209
AddTrLog = no
useexternalinverter = quda
usesloppyprecision = single
solver = cgmmsnd
EndMonomial
BeginMonomial NDCLOVERRAT
Timescale = 3
kappa = 0.1400645
CSW = 1.74
AcceptancePrecision = 1e-21
ForcePrecision = 1e-16
MaxSolverIterations = 5000
StildeMin = 0.0000376
StildeMax = 4.7
Name = ndcloverrat_7_9
DegreeOfRational = 10
Cmin = 7
Cmax = 9
ComputeEVFreq = 0
2Kappamubar = 0.0394421632
2Kappaepsbar = 0.0426076209
AddTrLog = no
useexternalinverter = quda
usesloppyprecision = single
solver = cgmmsnd
EndMonomial
BeginMonomial NDCLOVERRATCOR
Timescale = 1
kappa = 0.1400645
CSW = 1.74
AcceptancePrecision = 1e-20
ForcePrecision = 1e-16
MaxSolverIterations = 5000
StildeMin = 0.0000376
StildeMax = 4.7
Name = ndcloverratcor
DegreeOfRational = 10
ComputeEVFreq = 0
2Kappamubar = 0.0394421632
2Kappaepsbar = 0.0426076209
useexternalinverter = quda
usesloppyprecision = double
solver = cgmmsnd
EndMonomial
BeginIntegrator
Type0 = 2MNFG
Type1 = 2MNFG
Type2 = 2MNFG
Type3 = 2MNFG
IntegrationSteps0 = 1
IntegrationSteps1 = 1
IntegrationSteps2 = 1
IntegrationSteps3 = 8
Tau = 1.0
Lambda0 = 0.166667
Lambda1 = 0.166667
Lambda2 = 0.166667
Lambda3 = 0.166667
NumberOfTimescales = 4
EndIntegrator
BeginOperator CLOVER
kappa = 0.1400645
2KappaMu = 0.0014846837
CSW = 1.74
UseEvenOdd = no
SolverPrecision = 1e-20
MaxSolverIterations = 500
UseExternalInverter = QUDA
Solver = mg
usesloppyprecision = single
EndOperator
Sure, I just do something like:
L=16 T=32 nrxprocs = 1 nryprocs = 1 nrzprocs = 1 ompnumthreads = 3 Measurements = 10000 StartCondition = hot NSave = 500000 ThetaT = 1.0 UseEvenOdd = yes ReversibilityCheck = no ReversibilityCheckIntervall = 100 DebugLevel = 2 BeginMonomial GAUGE Type = Iwasaki beta = 1.9 Timescale = 0 UseExternalLibrary = quda EndMonomial BeginIntegrator Type0 = 2MN IntegrationSteps0 = 100 Tau = 1.0 Lambda0 = 0.193 NumberOfTimescales = 1 EndIntegrator
for pure gauge.
Thank you, we also see the problem, now try to identify the issue
Actually I found an issue with OPENMP threads, will upload the fixing asap
Sure, I just do something like:
L=16 T=32 nrxprocs = 1 nryprocs = 1 nrzprocs = 1 ompnumthreads = 3 Measurements = 10000 StartCondition = hot NSave = 500000 ThetaT = 1.0 UseEvenOdd = yes ReversibilityCheck = no ReversibilityCheckIntervall = 100 DebugLevel = 2 BeginMonomial GAUGE Type = Iwasaki beta = 1.9 Timescale = 0 UseExternalLibrary = quda EndMonomial BeginIntegrator Type0 = 2MN IntegrationSteps0 = 100 Tau = 1.0 Lambda0 = 0.193 NumberOfTimescales = 1 EndIntegrator
for pure gauge.
Thank you, we also see the problem, now try to identify the issue
Actually I found an issue with OPENMP threads, will upload the fixing asap
@kostrzewa I try to push the fixing, but seems that I do not have right to push here: ERROR: Permission to etmc/tmLQCD.git denied to pittlerf. fatal: Could not read from remote repository.
Please make sure you have the correct access rights and the repository exists. Can you add me writing permissions? Thanks
You should be able to push. How have you configured the remote?
Actually you are right, the base permission was "read" for some reason. You should be able to push now.
Actually you are right, the base permission was "read" for some reason. You should be able to push now.
thank you :)
Very nice, the last commit has done the trick. I have a few more instrumentation which I'd like to push if you don't mind and one small performance improvement.
Very nice, the last commit has done the trick. I have a few more instrumentation which I'd like to push if you don't mind and one small performance improvement.
of coarse, go ahead :)
Thanks! It would be great if we could resolve the merge conflict and go over the remaining issues over the next few days.
I think I've addressed the points in 6f579c8 and 7f61ee1.
I realised that the merge conflict was rather major so I went ahead and fixed it, it wasn't particularly clear...
We added also an online checking tool, which can be used in debugging. Shall we remove the executable test_gauge_derivative then, or we keep it?
We added also an online checking tool, which can be used in debugging. Shall we remove the executable test_gauge_derivative then, or we keep it?
Probably better to remove it since it might break and then we need to take care of it.
I've made some small adjustments and think that this can be merged. The gauge derivative regularly produces relative deviations in excess of 1e-10
on my machine, so I've increased the threshold. I don't think that the deviations that I see are worrisome, however as they occur only when the component in question is of a similar size to the threshold. I've made program termination dependent on g_strict_residual_check
, which allows one to study the behaviour of the deviations on all lattice sites over mutliple trajectories.
some steps towards gauge update on GPUs