PonyGE / PonyGE2

PonyGE2: grammatical evolution and variants in Python
GNU General Public License v3.0
154 stars 91 forks source link

Superfluous deep_copy() calls #89

Open jmmcd opened 6 years ago

jmmcd commented 6 years ago

It looks to me that these deep_copy() calls are superfluous -- some crossover operators may need them (to avoid changing existing individuals), but others don't. I guess that they may be slow. So maybe the responsibility to avoid changing the input parents should be on the crossover operator itself, not here.

https://github.com/PonyGE/PonyGE2/blob/40355f668757c8daa2503301c8c0eda54d566945/src/operators/crossover.py#L55 https://github.com/PonyGE/PonyGE2/blob/40355f668757c8daa2503301c8c0eda54d566945/src/operators/crossover.py#L56

mikefenton commented 6 years ago

When we do crossover on individuals we select pairs of individuals at random from the original parent pool. Since we can select the same individual multiple times, we have to create a new copy of them each time we select them from the original pool, as otherwise the original individual will be changed in-place and the next time they are selected for crossover from the pool they will be the changed version, not the original version.

jmmcd commented 6 years ago

My point is that only some crossover operators make changes to their inputs in-place.

mikefenton commented 6 years ago

Right, but the main crossover operator doesn't take this in to account, it just randomly selects pairs of individuals until the child pool is full. We don't know what the individual crossover functions are going to do, so to avoid any unwanted effects (i.e. people writing their own crossover functions and getting unexpected results) we do it once where you have indicated.

mikefenton commented 6 years ago

We can do it in the individual crossover functions, but we would need to add it in to the docs that any new functions may need to also do this to avoid over-writing parents as operations on individuals are applied inplace.

jmmcd commented 6 years ago

Yeah that's what I was thinking of. But yeah the disadvantage is people could then shoot themselves in the foot.

jmmcd commented 6 years ago

After working with it a bit more, I now understand that the deep_copy() is not doing so much work except in the case of derivation-tree genomes. And I was able work around another problem that the deep_copy() was causing me. So, I'm happy to drop this now -- thanks.

jmmcd commented 6 years ago

Hmmm... does the same reasoning apply to mutation also? I guess it might not, if mutation is only ever applied to individuals which have just been created by crossover. That's a property of many GP algorithms, but not all. In some algorithms, some new individuals are created by mutation of an individual in the current population. At the moment there's no deep_copy() for mutation.

MihaiBabiac commented 2 years ago

I actually managed to shoot myself in the foot because the importance of that deep_copy is not so obvious. I'd advocate for shifting it outside of the cross-over function to make it an obvious step of decoupling the old population from the new population. Alternatively it should be made clear in the docs that operators should never affect the input individual (and the current mutation and crossover code should be modified to follow this) and then the deep_copy is not necessary.

What I did was try to disable crossover completely by making operators.crossover.crossover just return the input population. I thought it was smart to skip the unnecessary work in those functions if I didn't need crossovers, but this lead to the deep_copy() calls not happening.

The issue is that the other operators collude to do unexpected things then:

Regarding the choice of parents to be used for cross-over, it might make sense to put this in a separate function. The current approach using sampling with replacement leads "with high probability" to some individuals not being used at all and others being used multiple times. This still happens even if crossover probability is zero, which is quite unexpected. Shifting the choice of parents to a function outside of crossover would also provide a much better location for doing the deep copies.