Parent selection for crossover

N-Wouda commented 2 years ago

[Based on what we dicussed today.]

Currently we do two binary tournaments to select the two parents that will be used for crossover. If the two parents are identical, we attempt to find another one that is not.

@leonlan had an idea to select only the first parent by binary tournament. Then, the second parent can be selected based on proximity (broken pairs distance). This ensures we find parents that are somewhat similar, so the offspring can benefit from their 'good genes' (and, hopefully, not need so much local search after crossover). The downside is a potential lack of diversity, so perhaps we should also make sure that's not impacted too much.

N-Wouda commented 2 years ago

I'm going to test the following simple ideas first:

Binary tournament, with one parent from feas, and the other from the infeas population
First parent binary tournament, other selected randomly from its proximity structure. Possible downsides are low diversity, and that the parents will both be either feas or infeas.

Just to get a feel for what matters (feas/infeas pairs, or 'nearness' in diversity/genes).

leonlan commented 2 years ago

I got this idea from the following quote:

Indeed, recombining only neighboring solutions may improve the efficiency of crossover operators [209]. Talbi 2021

The quote above refers to this paper [209]. I don't know anything about path relinking, but assuming that it's just a crossover, then Section 3.5 describes some upper bound on the distance between the initial and guiding solution (parents?):

The path solution selection method (lines 13 and 24 in Algorithm 1) has the goal of choosing a solution on the path explored via path relinking. Specifically, it first constructs a candidate solution list (CSL) that includes the solutions with a Hamming distance of at least d/3 from both the initiating and guiding solutions. Then, the solution in CSL with the maximum objective value is chosen for further amelioration by the solution improvement method described in Sect. 3.6.

leonlan commented 2 years ago

I think what also could be interesting is to set a upper/lower bound on the diversity between two parents. For instance, we select two parents. If div(par1, par2) < lower bound or div(par1, par2) > upper bound, then do another binary tournament to select a new parent 2. I think 0.05 as lower bound and 0.40 as upper bound are reasonable values.

N-Wouda commented 2 years ago

@leonlan yes, I was planning on doing something similar! I'll test that in a bit. First feas/infeas.

N-Wouda commented 2 years ago

Binary tournament, with one parent from feas, and the other from the infeas population

This is slightly worse: 164363 (+50 from main) average cost, 54709 iterations (~+3.5K from main). Now testing the second simple idea: one from binary tournament, the other parent from the nearest nbClose individuals in diversity of the first parent.

N-Wouda commented 2 years ago

First parent binary tournament, other selected randomly from its proximity structure.

This is also worse: 164620 average cost, with 66196 iterations. So our intuition is correct: having similar parents results in less work for the local search. But we clearly pay for the lack in diversity, with an increased average objective. I'll try some of the bounded diversity ideas next.

N-Wouda commented 2 years ago

If div(par1, par2) < lower bound or div(par1, par2) > upper bound, then do another binary tournament to select a new parent 2. I think 0.05 as lower bound and 0.40 as upper bound are reasonable values.

I'm testing this now with the suggested defaults.

N-Wouda commented 2 years ago

That results in 164348 average cost, with 54034 iterations. So +30-ish from main on cost. I'm doing another run without the lower bound set to zero to determine which part seems to be the problem (too much or too little diversity).

N-Wouda commented 2 years ago

Interestingly, setting the lower bound to zero increases the average cost slightly (+15), and does not seem to change the number of iterations too much. So perhaps the upper bound of 0.40 is too low. I'll try, say, 0.60 next (together with 0.05 for the lower bound, as before).

N-Wouda commented 2 years ago

That has resulted in the best static benchmark we've found so far: 164306 average cost, with 51072 iterations. So there might be some merit in bounding the diversity a bit, but we should probably think a bit better on how to go about this exactly.

N-Wouda commented 2 years ago

Bounds of [0.05, 0.5] is again slightly better, largely because it results in slightly more iterations. Average cost 164303, 51933 iterations.

So something like this could work.

N-Wouda commented 2 years ago

I've also tried [0.1, 0.5], but the benefits of increasing the lower bound is very small (slightly better, at 164302, but again, that's basically meaningless). Reducing the upper bound to 0.45 is not helpful.

So [0.1, 0.5] seems to be about the best we can do.

N-Wouda / Euro-NeurIPS-2022

Parent selection for crossover #144