Open kostrzewa opened 2 years ago
The preliminary idea for the input is as follows but this has to be fine-tuned depending how the algorithm will turn out in the end:
BeginExternalInverter QUDA
Pipeline = 24
gcrNkrylov = 24
MGNumberOfLevels = 3
MGNumberOfVectors = 24, 32
MGSetupSolver = cg
MGSetup2KappaMu = 0.000224102400
MGVerbosity = summarize, silent, silent
MGSetupSolverTolerance = 5e-7, 5e-7
MGSetupMaxSolverIterations = 1500, 1500
MGCoarseSolverType = gcr, gcr, cagcr
MGSmootherType = cagcr, cagcr, cagcr
MGBlockSizesX = 4,3
MGBlockSizesY = 4,3
MGBlockSizesZ = 3,2
MGBlockSizesT = 4,2
MGCoarseMuFactor = 1.0, 1.0, 20.0
MGCoarseMaxSolverIterations = 50, 50, 50
MgCoarseSolverTolerance = 0.1, 0.1, 0.1
MGSmootherPostIterations = 2, 2, 2
MGSmootherPreIterations = 0, 0, 0
MGSmootherTolerance = 0.1, 0.1, 0.1
MGOverUnderRelaxationFactor = 0.85, 0.85, 0.85
EndExternalInverter
BeginTuneMGParams QUDA
MGCoarseMuFactorSteps = 10, 10, 10
MGCoarseMuFactorDelta = 0.1, 0.2, 5
MGCoarseMaxSolverIterationsSteps = 10, 10, 10
MGCoarseMaxSolverIterationsDelta = -5, -5, -5
MGCoarseSolverToleranceSteps = 10, 10, 10
MGCoarseSolverToleranceDelta = 0.05, 0.05, 0.05
MGSmootherPreIterationsSteps = 4, 4, 4
MGSmootherPreIterationsDelta = 1, 1, 1
MGSmootherPostIterationsSteps = 4, 4, 4
MGSmootherPostIterationsDelta = 1, 1, 1
MGSmootherToleranceSteps = 4, 4, 4
MGSmootherToleranceDelta = 0.1, 0.1, 0.1
MGOverUnderRelaxationFactorSteps = 4, 4, 4
MGOverUnderRelaxationFactorDelta = 0.05, 0.05, 0.05
MGTuningIterations = 1000
# when in a particular tuning step the improvement is less than 1%, we
# move on to the next parameter to be tuned
MGTuningTolerance = 0.99
EndTuneMGParams
There may be some adaptive process added to dynamically reduce the search space if certain parameter changes don't affect the tts.
I will probably change the input format such that one doesn't specify min/max and a number of steps but a "delta" for each parameter and level and a number of steps that this delta should be applied for
The current "algorithm" (I use the word very cautiously) can start with a completely useless setup which doesn't converge and finds something which does. Unfortunately, it doesn't yet find a better minimum than I can find by hand. However, I've tested this only on small lattices (16c32 and 24c48, albeit at the physical point) and I suspect that it will work better on larger lattices.
Funnily enough, this actually works and seems to find parameter sets that I would have never considered. For example, on cA211.12.48, this is a parameter set that it evolves to:
QUDA-MG param tuner: BEST SET OF PARAMETERS
-------------------------------------------
mg_mu_factor: (1.000000, 3.000000, 27.000000)
mg_coarse_solver_maxiter: (20, 10, 50)
mg_coarse_solver_tol: (0.200000, 0.400000, 0.200000)
mg_nu_post: (6, 6, 8)
mg_nu_pre: (0, 4, 2)
mg_smoother_tol: (0.200000, 0.200000, 0.100000)
mg_omega: (0.950000, 1.050000, 0.850000)
Timing: 1.989135, Iters: 51
-------------------------------------------
First experience on a large volume (64c128) at the physical point suggests that this tuner, surprisingly, really seems to work.
Setting
BeginTuneMGParams QUDA
MGCoarseMuFactorSteps = 10, 10, 11
MGCoarseMuFactorDelta = 0.25, 0.5, 5
MGCoarseMaxSolverIterationsSteps = 10, 10, 10
MGCoarseMaxSolverIterationsDelta = 5, 5, 5
MGCoarseSolverToleranceSteps = 10, 10, 10
MGCoarseSolverToleranceDelta = 0.05, 0.05, 0.05
MGSmootherPreIterationsSteps = 2, 2, 2
MGSmootherPreIterationsDelta = 1, 1, 1
MGSmootherPostIterationsSteps = 2, 2, 2
MGSmootherPostIterationsDelta = 2, 2, 2
MGSmootherToleranceSteps = 4, 4, 4
MGSmootherToleranceDelta = 0.1, 0.1, 0.1
MGOverUnderRelaxationFactorSteps = 3, 3, 3
MGOverUnderRelaxationFactorDelta = 0.05, 0.05, 0.05
MGTuningIterations = 1000
# when in a particular tuning step the improvement is less than 1%, we
# move on to the next parameter to be tuned
MGTuningTolerance = 0.99
EndTuneMGParams
and starting from
BeginExternalInverter QUDA
Pipeline = 24
gcrNkrylov = 24
MGNumberOfLevels = 3
MGNumberOfVectors = 24, 32
MGSetupSolver = cg
MGSetup2KappaMu = 0.000215613244
MGVerbosity = silent, silent, silent
MGSetupSolverTolerance = 5e-7, 5e-7
MGSetupMaxSolverIterations = 1500, 1500
MGCoarseSolverType = gcr, gcr, cagcr
MGSmootherType = cagcr, cagcr, cagcr
MGBlockSizesX = 4,2
MGBlockSizesY = 4,2
MGBlockSizesZ = 4,2
MGBlockSizesT = 4,2
MGResetSetupMDUThreshold = 1.0
MGRefreshSetupMDUThreshold = 0.0263
MGRefreshSetupMaxSolverIterations = 30, 30
MGCoarseMuFactor = 1.0, 1.0, 20.0
MGCoarseMaxSolverIterations = 15, 15, 15
MGCoarseSolverTolerance = 0.1, 0.1, 0.1
MGSmootherPostIterations = 2, 2, 2
MGSmootherPreIterations = 0, 0, 0
MGSmootherTolerance = 0.1, 0.1, 0.1
MGOverUnderRelaxationFactor = 0.90, 0.90, 0.90
EndExternalInverter
the tuner takes the solver from non-convergence through a successful solve in around 9 seconds (on Meluxina)
QUDA-MG param tuner: BEST SET OF PARAMETERS
-------------------------------------------
mg_mu_factor: (1.000000, 1.000000, 65.000000)
mg_coarse_solver_maxiter: (15, 15, 15)
mg_coarse_solver_tol: (0.100000, 0.100000, 0.100000)
mg_nu_post: (2, 2, 2)
mg_nu_pre: (0, 0, 0)
mg_smoother_tol: (0.100000, 0.100000, 0.100000)
mg_omega: (0.900000, 0.900000, 0.900000)
Timing: 8.628203, Iters: 112
-------------------------------------------
down to a solve in 2.5 seconds with parameters that I would not have thought to choose by hand:
QUDA-MG param tuner: BEST SET OF PARAMETERS
-------------------------------------------
mg_mu_factor: (1.000000, 4.000000, 120.000000)
mg_coarse_solver_maxiter: (15, 25, 30)
mg_coarse_solver_tol: (0.100000, 0.200000, 0.150000)
mg_nu_post: (2, 6, 10)
mg_nu_pre: (0, 0, 6)
mg_smoother_tol: (0.200000, 0.200000, 0.200000)
mg_omega: (0.900000, 0.900000, 0.950000)
Timing: 2.501800, Iters: 64
-------------------------------------------
Using these parameters in practice and comparing between the "hand-tuned" setup on the left and the auto-tuned setup on the right:
MGCoarseMuFactor = 1.0, 1.0, 80.0 -> MGCoarseMuFactor = 1.0, 4.0, 120.0
MGCoarseMaxSolverIterations = 30, 30, 30 -> MGCoarseMaxSolverIterations = 15, 25, 30
MGCoarseSolverTolerance = 0.3, 0.2, 0.15 -> MGCoarseSolverTolerance = 0.1, 0.2, 0.15
MGSmootherPostIterations = 4, 4, 6 -> MGSmootherPostIterations = 2, 6, 10
MGSmootherPreIterations = 0, 0, 1 -> MGSmootherPreIterations = 0, 0, 6
MGSmootherTolerance = 0.2, 0.2, 0.2 -> MGSmootherTolerance = 0.2, 0.2, 0.2
MGOverUnderRelaxationFactor = 1.00, 0.90, 0.90 -> MGOverUnderRelaxationFactor = 0.90, 0.90, 0.95
I seem to obtain very stable timings so far (red is the auto-tuned MG setup):
After some more runtime, extracting the time to solution of the two MG setups, I get the following histograms after resampling to get the same number of solver calls in both cases (logarithmic count axis):
Doing the same on a L=48 simulation at the physical point similarly leads to a very nice improvement. Below, untuned
refers to a hand-selected MG setup. mk1tuned
refers to the auto-tuning result after about 100 tuning iterations and mk2tuned
the setup which was reached at the end of the tuning procedure.
The two "peaks" correspond to inversions related to cloverdetratio2light
(below and around 1 second in the tuned setups) and cloverdetratio3light
(from 1.5 seconds and up) and both timings from the HB/ACC steps as well as from the derivative are included in the histograms.
The final setup is:
MGCoarseMuFactor = 1.0, 2.5, 105.0
MGCoarseMaxSolverIterations = 15, 15, 15
MGCoarseSolverTolerance = 0.1, 0.35, 0.25
MGSmootherPostIterations = 2, 2, 4
MGSmootherPreIterations = 0, 0, 1
MGSmootherTolerance = 0.2, 0.1, 0.2
MGOverUnderRelaxationFactor = 0.90, 0.90, 1.00
note to self from meeting just now: it should be possible to integrate this directly in the HMC
N
times -> enter MG tuning loop for k
iterations in an attempt to stabilize the MG
started work on a simple algorithm to automatically tune the (QUDA)-MG parameters which can be tuned without rebuilding the setup