Algorithm tuning - Githubissues

N-Wouda commented 2 years ago

There are lots of parameters to the HGS algorithm, not many of which seem to be tuned particularly well. At some point, we should run a tuning tool (e.g., smac) to determine a good set of parameters.

Parameters, in this sense, also includes "which operators to use" (see also e.g. #32).

N-Wouda commented 2 years ago

We should probably have different configurations for different instance sizes. A natural split seems to be in three parts, like how the quali/final run-times are split.

leonlan commented 1 year ago

We also might have to consider separate parameter tuning for the dynamic instances. The dynamic dispatch instances are very different than the ones from the static problem because of the relatively tight time windows and small instance sizes.

Based on greedy/hindsight/rollout, a rough estimate is that 40% of the instances is between 100-150 customers, 40% is between 50-100 customers, 10% <50 customers and 10% 150-200 customers.

N-Wouda commented 1 year ago

Static

We should have different configurations for each of the three instance sizes (of <300, 300-500, >500 clients), since they get different time limits and are in general very different in size and/or shape.

We currently have these parameters (22 in total):

Param	Default value	Description
nbIter	10K	Number of iters without improvement before restarting
initialTimeWarpPenalty	1
nbPenaltyManagement	100	Number of iterations between penalty updates
feasBooster	2	Penalty increase if no feasible solutions yet
penaltyIncrease	1.2	Penalty increase if insufficient (but not zero) feasible solutions
penaltyDecrease	0.85	Penalty decrease if too many feasible solutions
minPopSize	25	Minimum population size
generationSize	40	Min + gen size = max pop size before culling
nbElite	4	Number of elite individuals which are biased to survive in survival mechanism
nbClose	5	Number of closest individuals to calculate average diversity
targetFeasible	40%	Target percentage of generated pops that were feasible (between penalty updates)
nbKeepOnRestart	1	Number of best feasible pops to keep on restart
repairProbability	50%	Probability of attempting to repair infeasible offspring
repairBooster	10	Penalty booster when repairing (hammering out infeasibilities)
selectProbability	90%	Geometric selection probability of best crossover offspring
nbGranular	40	Number of 'near' clients explored by node operators in local search
weightWaitTime	2	Penalty for waiting time when determining granular neighborhoods
weightTimeWarp	10	Penalty for time warps when determining granular neighborhoods
circleSectorOverlapTolerance	0 degrees	Overlap percentage before route operators are applied
minCircleSectorSize	15 degrees	Minimum overlap percentage (overlap is bounded from below by this)
destroyPct	20%	Percentage of the solution to destroy in broken pairs exchange crossover
postProcessPathLength	4	Subpath length to optimally recombine by enumeration in local search postprocessing

Additionally, if we have to tune the node, route, and the crossover operators (another 10-15 or so) as well, we have maybe 40 parameters in total. That's quite a lot. Can we shorten this list somehow, or do you know of a tuning algorithm that can deal with this? I'm not sure SMAC can.

Dynamic

This probably also needs its own static configuration, particularly for the simulations (since those need to be fast). The configuration for the epoch optimisation after rollout can probably be shared with that for the small instances.

Rollout needs a static parameter configuration (for the simulations), and the following additional parameters:

Param	Default value	Description
N_LOOKAHEAD	3	Number of periods to simulate ahead
SIM_TLIM_FACTOR	50%	Percentage of the epoch time limit to take for simulating
SIM_SOLVE_ITERS	25	Number of static solver iterations to use in a simulation
DISPATCH_THRESHOLD	15%	Threshold to clear before dispatching a customer in current epoch

N-Wouda commented 1 year ago

Each parameter lies in some [min, max] range. One simple way to get a feel for this would be to run two benchmarks for each parameter value: one where we increase it from the default to the max value, and one where we decrease it to the min value. That's doable, and should quickly give us a feel for which parameters really matter, and which do not seem to be as relevant. Then we can take the subset of parameters that really matter and tune those together.

N-Wouda commented 1 year ago

@leonlan @LuukPentinga @jaspervd96 @rijalarpan @MarjoleinAerts @sbhulai: what do you think?

rijalarpan commented 1 year ago

@leonlan @LuukPentinga @jaspervd96 @rijalarpan @MarjoleinAerts @sbhulai: what do you think?

Each parameter lies in some [min, max] range. One simple way to get a feel for this would be to run two benchmarks for each parameter value: one where we increase it from the default to the max value, and one where we decrease it to the min value. That's doable, and should quickly give us a feel for which parameters really matter, and which do not seem to be as relevant. Then we can take the subset of parameters that really matter and tune those together.

It is a might load of parameter to tune. I think the min and max idea may not work fully, because in my experience some of them have non-linear behaviour. I would consider a set of values for each parameter and test in a full factorial way. Then as you suggested to go deeper into few of them. I do not think a parameter optimization packages like the ones available for machine learning will work here.

leonlan commented 1 year ago

I agree with @rijalarpan; these parameters are usually non-linear in behavior (e.g., destroyPct), so we need more than min-max. A factorial design should be manageable and I can also help to tune part of the paramters. Some other thoughts

I don't think we need to try all parameters. For example, the initial time warp penalty of 1 doesn't have to be changed I think, because the penalty-related parameters should deal with that. Actually, I think with a factorial design we can easily figure out which parameters need to be tuned and which not. (This is what you also propose.)
We could try to optimize logical parameter groups. For example, (penaltyIncrease, penaltyDecrease, targetFeasible) seem to belong together and are rather independent from the rest. We use SMAC or some other tuning algorithm to tune this.

Also I filled in the remaining ?? for some parameters (nbElite, nbClose, weightTimeWarp, weightWaitTime).

N-Wouda commented 1 year ago

A factorial design should be manageable

We have 40+ parameters. How is even the simplest factorial design (roughly 2^40) manageable?

leonlan commented 1 year ago

A factorial design should be manageable

We have 40+ parameters. How is even the simplest factorial design (roughly 2^40) manageable?

It's not 😅 My understanding of the (full) factorial design was flawed. Let me rephrase. A full factorial design is not feasible, but it should be manageable to perform a factorial design for smaller logical parameter groups.

MarjoleinAerts commented 1 year ago

Agree with the comments to check a couple of values for the parameters and then focus on the ones with the most impact. Min and max might not give good indications, as also suggested in the comments above

rijalarpan commented 1 year ago

A factorial design should be manageable

We have 40+ parameters. How is even the simplest factorial design (roughly 2^40) manageable?

It's not 😅 My understanding of the (full) factorial design was flawed. Let me rephrase. A full factorial design is not feasible, but it should be manageable to perform a factorial design for smaller logical parameter groups.

Could the procedure comparable to variable neighborhood depth be pragmatic here? It would follow a similar procedure:

Shuffle the parameters in random order.
Select one of them and optimize with all others fixed.
Select the optimal choice of the previous one and continue untill all parameter values are fixed.
Until time runs out, keep redoing 1 to 3.

Then, you can try to see if a full-factorial set up of a limited set of parameters that proved of greatest value from the above process and focus on them.

leonlan commented 1 year ago

Here are our current parameters and operators organized per category. I like the suggestion by @rijalarpan to do a VND kind of approach to optimize this, but I think its best done per parameter group instead of 1 single parameter. Those parameter group sizes are not super big, so within each (sub)group you could do a full-factorial approach.

Click me to show parameter groups

## Penalty management Param | Default value | Description -- | -- | -- initialTimeWarpPenalty | 1 | nbPenaltyManagement | 100 | Number of iterations between penalty updates feasBooster | 2 | Penalty increase if no feasible solutions yet penaltyIncrease | 1.2 | Penalty increase if insufficient (but not zero) feasible solutions penaltyDecrease | 0.85 | Penalty decrease if too many feasible solutions targetFeasible | 40% | Target percentage of generated pops that were feasible (between penalty updates) repairProbability | 50% | Probability of attempting to repair infeasible offspring repairBooster | 10 | Penalty booster when repairing (hammering out infeasibilities) ## Population management Param | Default value | Description -- | -- | -- minPopSize | 25 | Minimum population size generationSize | 40 | Min + gen size = max pop size before culling nbElite | 4 | Number of elite individuals which are biased to survive in survival mechanism nbClose | 5 | Number of closest individuals to calculate average diversity ### Restart mechanism Param | Default value | Description -- | -- | -- nbIter | 10K | Number of iters without improvement before restarting nbKeepOnRestart | 1 | Number of best feasible pops to keep on restart ## Crossover Param | Default value | Description -- | -- | -- selectProbability | 90% | Geometric selection probability of best crossover offspring destroyPct | 20% | Percentage of the solution to destroy in broken pairs exchange crossover brokenPairsExchange | N | - selectiveRouteExchange | Y | - edgeAssembly | N | - ## Local search ### Node Ops Param | Default value | Description -- | -- | -- Exchange 10 | Y | Exchange 11 | Y | Exchange 20 | Y | MoveTwoClientsReversed | Y | Exchange 21 | Y | Exchange 22 | Y | Exchange 31 | N | Exchange 32 | N | Exchange 33 | N | TwoOpt | Y | ### Granular neighborhoods Param | Default value | Description -- | -- | -- nbGranular | 40 | Number of 'near' clients explored by node operators in local search weightWaitTime | 2 | Penalty for waiting time when determining granular neighborhoods weightTimeWarp | 10 | Penalty for time warps when determining granular neighborhoods ### Intensification and route ops Param | Default value | Description -- | -- | -- circleSectorOverlapTolerance | 0 degrees | Overlap percentage before route operators are applied minCircleSectorSize | 15 degrees | Minimum overlap percentage (overlap is bounded from below by this) shouldIntensify | Y | Whether to apply route and enumeration operators to new best solutions RelocateStar | Y | SwapSwar | Y | ### Post-process Param | Default value | Description -- | -- | -- postProcessPathLength | 4 | Subpath length to optimally recombine by enumeration in local search postprocessing

N-Wouda commented 1 year ago

Added three more node ops that we keep forgetting about :-).

N-Wouda commented 1 year ago

I like the idea of doing things by logical group. If we want to do a factorial design we need to come up with levels for the non-binary parameters (e.g. what's 'low' or 'high' for nbGranular?).

We might also just dump every group into, say, SMAC or ParamILS and have that thing tune for us. The number of parameters we have in each group is manageable for these algorithms, so I suspect they can get us a reasonably good configuration quite quickly. If the resulting configuration of tuning each group in isolation is not already effective, we can use the tuning trajectories to identify which parameters should be tuned together in a follow-up step.

N-Wouda commented 1 year ago

@rijalarpan @leonlan [and anyone else that's interested] shall we have a short meeting tomorrow at, say, 15h to discuss this specific subject further? I can do anytime tomorrow, except 14-15h.

leonlan commented 1 year ago

@rijalarpan @leonlan [and anyone else that's interested] shall we have a short meeting tomorrow at, say, 15h to discuss this specific subject further? I can do anytime tomorrow, except 14-15h.

I'm available all day before 15:30. So 15:00-15:30 I could meet.

rijalarpan commented 1 year ago

@rijalarpan @leonlan [and anyone else that's interested] shall we have a short meeting tomorrow at, say, 15h to discuss this specific subject further? I can do anytime tomorrow, except 14-15h.

I'm available all day before 15:30. So 15:00-15:30 I could meet.

I am away for holidays but I can find time if between 09:00-10:00.

N-Wouda commented 1 year ago

Tomorrow at 9 works for me. I'll send a Google meet in a bit.

N-Wouda commented 1 year ago

Here's the link: https://meet.google.com/ekp-weba-xci. See you tomorrow!

N-Wouda commented 1 year ago

What we discussed just yet:

Tuning by group is the most sensible thing to do, given that we cannot do everything at once.
There's ordering in the groups, and the order in which we tune probably matters (e.g., if we first tune group 1 and then group 2 we get different results than if we had first tuned group 2 and then group 1). We should experiment with this a bit to see if this matters (too) much.
We'll start with tuning the penalty management, population management, and restart mechanism groups.
We'll use a tuning algorithm for this (e.g. SMAC or ParamILS). Niels will have a look at this later today.
After tuning each group we might want to tune the most important parameters of each group together as well, in a second tuning round. But that's for later.

N-Wouda commented 1 year ago

After this Friday (14 October) I would like to freeze any new static changes, and focus solely on tuning the static solver. So any planned or open PRs that impact the static solver should ideally be merged before next weekend!

N-Wouda commented 1 year ago

There's a lot of stuff here. Including SMAC, and several other options. I'll have a detailed look tomorrow and make a brief list of algorithms that seem to work well, and that we can get working easily from Python.

N-Wouda commented 1 year ago

A brief update on static tuning (#134):

We're doing SMAC, with its new ask-and-tell interface. That level of control is needed for MPI.
We're doing MPI so we can scale tuning easily to cluster size/a few hundred cores.
We split config settings by instance size (<300, 3-500, >500).
We should do some cross validation. I'm building that into the tuning script.
We need to set allowable ranges for each parameter. I'll add a table here when I get home.

I think that I can get a (noisy) evaluation of a single configuration down to about the qualification run time, which is more or less the best possible level of parallelisation we can hope for. But that'll require some further work to make happen.

N-Wouda commented 1 year ago

Proposed ranges for each parameter (for now only for the three groups that we want to tune first):

Penalty management

Param	Default value	Min	Max
initialTimeWarpPenalty	1	1	100
nbPenaltyManagement	100	25	500
feasBooster	2	1	10
penaltyIncrease	1.2	1	5
penaltyDecrease	0.85	.25	1
targetFeasible	40%	0%	100%
repairProbability	50%	0%	100%
repairBooster	10	1	25

Population management

Param	Default value	Min	Max
minPopSize	25	5	100
generationSize	40	1	100
nbElite	4	0	25
nbClose	5	1	25

Restart mechanism

Param	Default value	Min	Max
nbIter	10K	1K	10K
nbKeepOnRestart	0	0	5

Please feel free to remark on these suggested ranges. For some of them I have some intuition, for others I'm mostly guessing.

leonlan commented 1 year ago

Parameter ranges look good to me. Is there also a need for step sizes or does SMAC figure that out itself?

N-Wouda commented 1 year ago

Is there also a need for step sizes or does SMAC figure that out itself?

There's the option to indicate this, yes, but SMAC can figure it out on its own as well! Edit: I've decided to do the quantization ourselves. This is the full configuration now (including suggestions for the other parameters):

        # Penalty management
        Integer("initialTimeWarpPenalty", (1, 25), default=1),
        Integer("nbPenaltyManagement", (25, 500), default=100, q=25),
        Float("feasBooster", (1, 10), default=2.0, q=0.1),
        Float("penaltyIncrease", (1, 5), default=1.2, q=0.1),
        Float("penaltyDecrease", (0.25, 1), default=0.85, q=0.05),
        Float("targetFeasible", (0, 1), default=0.4, q=0.05),
        Integer("repairProbability", (0, 100), default=50, q=5),
        Integer("repairBooster", (1, 25), default=10),

        # Population management
        Integer("minPopSize", (5, 100), default=25, q=5),
        Integer("generationSize", (0, 100), default=40, q=5),
        Integer("nbElite", (0, 25), default=4, q=5),
        Integer("nbClose", (1, 25), default=5, q=4),

        # Restart mechanism
        Integer("nbIter", (1_000, 10_000), default=10_000, q=250),
        Integer("nbKeepOnRestart", (0, 5), default=0),

        # Crossover
        Integer("selectProbability", (50, 100), default=90, q=10),
        Integer("destroyPct", (5, 50), default=20, q=5),
        Integer("brokenPairsExchange", (0, 1), default=0),
        Integer("selectiveRouteExchange", (0, 1), default=1),

        # Node ops
        Integer("Exchange10", (0, 1), default=1),
        Integer("Exchange11", (0, 1), default=1),
        Integer("Exchange20", (0, 1), default=1),
        Integer("MoveTwoClientsReversed", (0, 1), default=1),
        Integer("Exchange21", (0, 1), default=1),
        Integer("Exchange22", (0, 1), default=1),
        Integer("TwoOpt", (0, 1), default=1),
        Integer("Exchange31", (0, 1), default=0),
        Integer("Exchange32", (0, 1), default=0),
        Integer("Exchange33", (0, 1), default=0),

        # Granular neighborhoods
        Integer("nbGranular", (10, 100), default=40, q=5),
        Integer("weightWaitTime", (1, 25), default=2),
        Integer("weightTimeWarp", (1, 25), default=10),

        # Intensification (and route ops)
        Integer("shouldIntensify", (0, 1), default=1),
        Integer("circleSectorOverlapTolerance", (0, 359), default=0),
        Integer("minCircleSectorSize", (0, 359), default=15),
        Integer("postProcessPathLength", (1, 8), default=7),

N-Wouda commented 1 year ago

@leonlan @rijalarpan what do you think of my suggested configuration space above? Each field reads as <param type>(<param name>, (<low>, <high>), default=<default>), with optional step size/quantisation q.

I propose we do not test RELOCATE and SWAP separately: they're not that expensive now that we do intensification only when we find a new best solution. More depends on whether we do intensification in the first place (i.e., whether shouldIntensify is true or not).

N-Wouda commented 1 year ago

@leonlan are these conditionals correct?

        # Only valid parameters when we do, in fact, intensify
        EqualsCondition(cs["circleSectorOverlapTolerance"],
                        cs["shouldIntensify"],
                        1),
        EqualsCondition(cs["minCircleSectorSize"],
                        cs["shouldIntensify"],
                        1),
        EqualsCondition(cs["postProcessPathLength"],
                        cs["shouldIntensify"],
                        1),

        # Repair booster only makes sense when we can actually repair
        NotEqualsCondition(cs["repairBooster"],
                           cs["repairProbability"],
                           0),

        # Parameter is specific to BPX
        EqualsCondition(cs["destroyPct"],
                        cs["brokenPairsExchange"],
                        1),

where

EqualCondition(a, b, val): a is a parameter only when b has value val
NotEqualsCondition(a, b, val): a is a parameter only when b does not have value val

Have I missed any?

rijalarpan commented 1 year ago

@leonlan @rijalarpan what do you think of my suggested configuration space above? Each field reads as

<param type>(<param name>, (<low>, <high>), default=<default>), with optional step size/quantisation q.

I propose we do not test RELOCATE and SWAP separately: they're not that expensive now that we do intensification only when we find a new best solution. More depends on whether we do intensification in the first place (i.e., whether shouldIntensify is true or not).

The ranges of parameters look wide enough. My concern would have been the step size but if SMAC takes care of it, then all looks good. Fingers crossed!

leonlan commented 1 year ago

@leonlan @rijalarpan what do you think of my suggested configuration space above? Each field reads as <param type>(<param name>, (<low>, <high>), default=<default>), with optional step size/quantisation q.

I propose we do not test RELOCATE and SWAP separately: they're not that expensive now that we do intensification only when we find a new best solution. More depends on whether we do intensification in the first place (i.e., whether shouldIntensify is true or not).

Integer("nbElite", (0, 25), default=4, q=5),
Integer("nbClose", (1, 25), default=5, q=4),

I suggest changing these step sizes to q=1. I don't have a good reason why, but I feel that (0, 4, 9, 14, ...) for nbElite is a bit too coarse, idem for nbClose.

Have I missed any [conditions]? There's one more for population:

nbElite <= minPopSize+generationSize.

N-Wouda commented 1 year ago

I'm running a hundred scenarios for penalty management right now. Should be done in, hopefully, a day or so.

N-Wouda commented 1 year ago

For the next parameter group (after penalty management), I propose the following:

Param	Default value	Min	Max
minPopSize	25	5	100
generationSize	40	1	100
nbElite	4	0	25
lbDiversity	10%	0%	25%
ubDiversity	50%	25%	100%
nbClose	5	1	25
nbIter	10K	1K	10K

I'm grouping the nbIter parameter with 'population management', and ignoring nbKeepOnRestart (so, we keep that fixed at zero). That's a bit prescriptive, but the benefit is that this saves us one small parameter group to tune.

N-Wouda commented 1 year ago

I'm running a hundred scenarios for penalty management right now. Should be done in, hopefully, a day or so.

They completed a few hours ago. I have finished a first analysis of them, and obtained a parameter configuration that's slightly better than what we had:

initialTimeWarpPenalty = 6
nbPenaltyManagement = 47
feasBooster = 2.5
penaltyIncrease = 1.3434343434343434
penaltyDecrease = 0.32196969696969696
targetFeasible = 0.4292929292929293
repairProbability = 79
repairBooster = 12

On the benchmark instances, this gets an average cost of 164276 (more than 25 better than current main).

With this new penalty configuration in place, I just started tuning the population parameters. Expect that again in a day or so.

N-Wouda commented 1 year ago

For the next parameter group (after population management), I propose the following:

Param	Default value	Min	Max
nbGranular	40	10	100
weightWaitTime	2	1	25
weightTimeWarp	10	1	25
circleSectorOverlapTolerance	0	0	359
minCircleSectorSize	15	0	359
postProcessPathLength	6	1	8

Where we keep shouldIntensify fixed at 1 (it's not costly anyway, and the last three parameters already toggle a lot of the cost/intensity).

I think we can skip the node and crossover operators for now: there we already know quite well what works.

N-Wouda commented 1 year ago

It seems that tuning the population parameters hasn't resulted in a significant improvement. There's one set of values that seems slightly better, so I'm testing that now. But it might not pan out - the defaults here were already quite good.

N-Wouda commented 1 year ago

None of the population parameter sets appear to be better than our current default: the one promising set I found gets an average objective +15 above our current settings. So we should probably keep the defaults for population management.

I'll set up the LS parameters now, and run that. That's the last set of static parameters to tune!

N-Wouda commented 1 year ago

The LS stuff is in now too, and there's potential here for another thirty-ish points improvement. There are a few candidate parameter settings that I'll benchmark later today, and then I'll pick the best settings based on what comes out of that.

N-Wouda commented 1 year ago

@leonlan @jaspervd96 before tuning the dynamic part: is there anything in the works there still that I have to wait for?

jmhvandoorn commented 1 year ago

Still trying all kind of variations I can think of, but (from my side at least) still nothing that clearly beats the original. So I would at least not wait for that and start tuning soon to have enough time for this.

In the "worst case", we might last minute find something that's better, with the same tuning as the original instead of something specifically tuned on that.

jmhvandoorn commented 1 year ago

Small addition: We should decide though if we can/want to tune certain parameters separately per epoch.

E.g. not one fixed threshold, but a separate threshold for each epoch.

N-Wouda commented 1 year ago

Small addition: We should decide though if we can/want to tune certain parameters separately per epoch.

I talked to @leonlan about this, and we figured we might try a different threshold for epoch 0, 1, and >1. Or just 0 and >0, because the first epoch is a bit special.

N-Wouda commented 1 year ago

Raw configurations + results of the static tuning runs are available here.

N-Wouda commented 1 year ago

The best configuration is this one:

nbGranular = 34
weightWaitTime = 18
weightTimeWarp = 20
circleSectorOverlapToleranceDegrees = 216
minCircleSectorSizeDegrees = 332
postProcessPathLength = 7

with an average cost of 164228, and 48234 iterations.

N-Wouda commented 1 year ago

minCircleSectorSizeDegrees = 332

This means that we're more or less trying all routes in the intensification phase. Based on that, I'm also trying something where we get rid of the whole circle sector overlap thing and just try all routes.

leonlan commented 1 year ago

It seems that tuning the population parameters hasn't resulted in a significant improvement. There's one set of values that seems slightly better, so I'm testing that now. But it might not pan out - the defaults here were already quite good.

I think it may be worthwhile to tune nbIter independently. Correct me if I'm wrong, but the current parameter tuning tests an entire configuration and, consequently, any positive effect of changing nbIter is probably negated by negative effects of changes in the other population parameters.

leonlan commented 1 year ago

@leonlan @jaspervd96 before tuning the dynamic part: is there anything in the works there still that I have to wait for?

I don't have any major changes, but I'm undecided whether we should use n_lookahead equal to 1, 2 or 3. For each value, it seems that we can get similar performance when setting the corresponding threshold to 35, 25 and 15%. We could try to use parameter tuning to figure out which number of lookahead periods to use, but the threshold ranges and values should be different in each case. Is this doable? On the other hand, it shouldn't matter if we just fix 1, 2 or 3 because the performance of each is very similar and then we can reduce the number of parameters to tune.

N-Wouda commented 1 year ago

minCircleSectorSizeDegrees = 332

This means that we're more or less trying all routes in the intensification phase. Based on that, I'm also trying something where we get rid of the whole circle sector overlap thing and just try all routes.

This gets 164224 average cost with 48905 iterations, so more or less the same outcome. It's a lot simpler, however, so I'll open a PR removing the circle sector stuff.

N-Wouda commented 1 year ago

For tuning dynamic: a full dynamic run on eight cores takes about 3h20min, so 3h40min should be plenty on the cluster.

A dynamic instance is derived from a static instance. This process uses some randomness, so there's a seed controlling this (--instance_seed). The solver also uses randomness, and there's a different seed controlling that (--solver_seed). This results in different variabilities over ten runs with different seeds:

Varying instance seed: average cost 371015, with stddev 1189
Varying solver seed: average cost 371008, with stddev 181

On the qualification and finals, we will be evaluated with different solver seeds (the instances - and their seeds - are fixed for us). So it's good that the variability in that is quite a bit smaller.

N-Wouda commented 1 year ago

For rollout, we have the following parameters

Parameter	Default	Meaning
rollout_tlim_factor	0.7	Fraction of epoch time to spend on simulations; the rest goes to solving the dynamic instance.
n_cycles	1	Number of simulation cycles to perform
n_simulations	50	Number of simulations in each cycle
n_lookahead	1	Number of epochs to sample ahead in each simulation
n_requests	100	Number of requests per sampled epoch
dispatch_threshold	0.35	We dispatch if the average number of times a request is dispatched (across the simulations) is above this threshold.

I propose the following ranges:

Parameter	Default	Min	Max
rollout_tlim_factor	0.5	0.6	0.9
n_cycles	1	1	3
n_simulations	50	25	100
n_lookahead	1	1	5
n_requests	100	50	100
dispatch_threshold	0.35	0	1.0

But I do not have a lot of intuition for this. @leonlan @jaspervd96 what do you think?

leonlan commented 1 year ago

Most parameter ranges look fine. The only change I suggest is to make dispatch_threshold depend on n_lookahead. All other parameters can be sampled independently.

n_lookahead=1 should take dispatch threshold (20, 35, 50) as (min, default, max)

For 2, (10, 25, 40)

For 3, (5, 15, 30)

For 4 and 5 I don’t know, but something like (5, 10, 20) should work. Maybe even skip 4, I don’t think it’s worth it to try both 4 and 5 because they don’t differ a lot from each other.

Verzonden vanuit Outlook voor iOShttps://aka.ms/o0ukef

Van: Niels Wouda @.> Verzonden: Friday, October 28, 2022 12:26:01 PM Aan: N-Wouda/Euro-NeurIPS-2022 @.> CC: Lan, L. (Leon) @.>; Mention @.> Onderwerp: Re: [N-Wouda/Euro-NeurIPS-2022] Algorithm tuning (Issue #33)

For rollout, we have the following parameters

Parameter Default Meaning rollout_tlim_factor 0.7 Fraction of epoch time to spend on simulations; the rest goes to solving the dynamic instance. n_cycles 1 Number of simulation cycles to perform n_simulations 50 Number of simulations in each cycle n_lookahead 1 Number of epochs to sample ahead in each simulation n_requests 100 Number of requests per sampled epoch dispatch_threshold 0.35 We dispatch if the average number of times a request is dispatched (across the simulations) is above this threshold.

I propose the following ranges:

Parameter Default Min Max rollout_tlim_factor 0.7 0.6 1.0 n_cycles 1 1 3 n_simulations 50 25 100 n_lookahead 1 1 5 n_requests 100 50 100 dispatch_threshold 0.35 0 1.0

But I do not have a lot of intuition for this. @leonlanhttps://github.com/leonlan @jaspervd96https://github.com/jaspervd96 what do you think?

— Reply to this email directly, view it on GitHubhttps://github.com/N-Wouda/Euro-NeurIPS-2022/issues/33#issuecomment-1294824728, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHMPWHVC2CIUJQ2AC7OGYO3WFOS3TANCNFSM55HPTNNQ. You are receiving this because you were mentioned.Message ID: @.***>

N-Wouda / Euro-NeurIPS-2022

Algorithm tuning #33

Static

Dynamic

Penalty management

Population management

Restart mechanism