aarongarrett / inspyred

Python library for bio-inspired computational intelligence
MIT License
187 stars 58 forks source link

'plus' replacement in replacers.py might have an issue #27

Closed albertotonda closed 8 months ago

albertotonda commented 1 year ago

Description

By performing debug printouts, I noticed something weird in inspyred.ec.replacers.plus_replacement ; in theory, "plus" should take the current population + the offspring, sort them, and return the best len(population) individuals as the survivors. However, unless I made a mistake with my debug printouts, it looks like that the 'parent' list in the argument contains only individuals that were actually selected for reproduction. So, the code for plus_replacement that goes:

    pool = list(offspring)
    pool.extend(parents)
    pool.sort(reverse=True)
    survivors = pool[:len(population)]
    return survivors

is not correct, as it creates a list with all individuals which reproduced + the offspring. In this way, it actually creates a 'survivors' list with potentially multiple copies of the same individual (if, for example, one individual was selected for reproduction multiple times). Correcting this should be easy enough:

    pool = list(offspring)
    **pool.extend(population)**
    pool.sort(reverse=True)
    survivors = pool[:len(population)]
    return survivors

However, I would double-check all other replacers, because they might have similar issues if they assume that 'parents' contains the current population and not just a list (with potential duplicates!) of the individuals that were selected for reproduction.

Here below is a printout of the content of 'parents', 'population', and 'offspring' for a problem where I have a custom individual.candidate genome (akin to Genetic Programming). I also have unique ids for the individuals. As you can see, while 'population' contains one copy of each individual, 'parents' has several copies of individuals 1 and 4, which were selected for reproduction multiple times.

[DEBUG 2023-07-21 10:09:01] Initial state of the parents:
[DEBUG 2023-07-21 10:09:01] 1:'u' : '(cos(P)/(P-P))'
[DEBUG 2023-07-21 10:09:01] 4:'u' : 'sin(cos(((P-P)+(L/P))))'
[DEBUG 2023-07-21 10:09:01] 1:'u' : '(cos(P)/(P-P))'
[DEBUG 2023-07-21 10:09:01] 1:'u' : '(cos(P)/(P-P))'
[DEBUG 2023-07-21 10:09:01] 4:'u' : 'sin(cos(((P-P)+(L/P))))'
[DEBUG 2023-07-21 10:09:01] 5:'u' : 'max(cos(min(L,(0.6575))),min(min(L,L),((0.6309)+(0.5804))))'
[DEBUG 2023-07-21 10:09:01] 9:'u' : '(((min(P,P)-max(L,P))*(min(L,(-0.9861))+min(L,P)))/(-0.6634))'
[DEBUG 2023-07-21 10:09:01] 4:'u' : 'sin(cos(((P-P)+(L/P))))'
[DEBUG 2023-07-21 10:09:01] 2:'u' : '(max(L,L)+max((-0.0879),(0.9665)))'
[DEBUG 2023-07-21 10:09:01] 8:'u' : 'sin(min(max(L,(0.7915)),(P/(0.2088))))'

[DEBUG 2023-07-21 10:09:01] Initial state of the population:
[DEBUG 2023-07-21 10:09:01] 0:'u' : 'cos(sin((-0.8001)))'
[DEBUG 2023-07-21 10:09:01] 1:'u' : '(cos(P)/(P-P))'
[DEBUG 2023-07-21 10:09:01] 2:'u' : '(max(L,L)+max((-0.0879),(0.9665)))'
[DEBUG 2023-07-21 10:09:01] 3:'u' : 'sqrt(sin(min((L-P),max(P,P))))'
[DEBUG 2023-07-21 10:09:01] 4:'u' : 'sin(cos(((P-P)+(L/P))))'
[DEBUG 2023-07-21 10:09:01] 5:'u' : 'max(cos(min(L,(0.6575))),min(min(L,L),((0.6309)+(0.5804))))'
[DEBUG 2023-07-21 10:09:01] 6:'u' : 'max(max((0.7001),P),sin(P))'
[DEBUG 2023-07-21 10:09:01] 7:'u' : '(cos((0.1225))*sqrt(L))'
[DEBUG 2023-07-21 10:09:01] 8:'u' : 'sin(min(max(L,(0.7915)),(P/(0.2088))))'
[DEBUG 2023-07-21 10:09:01] 9:'u' : '(((min(P,P)-max(L,P))*(min(L,(-0.9861))+min(L,P)))/(-0.6634))'

[DEBUG 2023-07-21 10:09:01] Initial state of the offspring:
[DEBUG 2023-07-21 10:09:01] 10:'u' : '(((P-P)+(L/P))/(P-P))'
[DEBUG 2023-07-21 10:09:01] 11:'u' : '(sin(P)/min(P,L))'
[DEBUG 2023-07-21 10:09:01] 12:'u' : 'sin(cos(min((P-P),max(P,P))))'
[DEBUG 2023-07-21 10:09:01] 13:'u' : 'L'
[DEBUG 2023-07-21 10:09:01] 14:'u' : 'sin(min(max(L,(0.7915)),(P/(0.2088))))'
aarongarrett commented 1 year ago

The plus replacement is implemented that way by design. The documentation says it's a combination of parents and offspring, rather than the entire population plus offspring. It's a design choice rather than a mistake.

Aaron

-- Aaron Garrett

On Fri, Jul 21, 2023 at 4:32 AM Alberto Tonda @.***> wrote:

  • inspyred version: 1.0.1
  • Python version: 3.10.9
  • Operating System: Windows

Description

By performing debug printouts, I noticed something weird in inspyred.ec.replacers.plus_replacement ; in theory, "plus" should take the current population + the offspring, sort them, and return the best len(population) individuals as the survivors. However, unless I made a mistake with my debug printouts, it looks like that the 'parent' list in the argument contains only individuals that were actually selected for reproduction. So, the code for plus_replacement that goes:

pool = list(offspring)
pool.extend(parents)
pool.sort(reverse=True)
survivors = pool[:len(population)]
return survivors

is not correct, as it creates a list with all individuals which reproduced

  • the offspring. In this way, it actually creates a 'survivors' list with potentially multiple copies of the same individual (if, for example, one individual was selected for reproduction multiple times). Correcting this should be easy enough:

    pool = list(offspring) pool.extend(population) pool.sort(reverse=True) survivors = pool[:len(population)] return survivors

However, I would double-check all other replacers, because they might have similar issues if they assume that 'parents' contains the current population and not just a list (with potential duplicates!) of the individuals that were selected for reproduction.

Here below is a printout of the content of 'parents', 'population', and 'offspring' for a problem where I have a custom individual.candidate genome (akin to Genetic Programming). I also have unique ids for the individuals. As you can see, while 'population' contains one copy of each individual, 'parents' has several copies of individuals 1 and 4, which were selected for reproduction multiple times.

[DEBUG 2023-07-21 10:09:01] Initial state of the parents: [DEBUG 2023-07-21 10:09:01] 1:'u' : '(cos(P)/(P-P))' [DEBUG 2023-07-21 10:09:01] 4:'u' : 'sin(cos(((P-P)+(L/P))))' [DEBUG 2023-07-21 10:09:01] 1:'u' : '(cos(P)/(P-P))' [DEBUG 2023-07-21 10:09:01] 1:'u' : '(cos(P)/(P-P))' [DEBUG 2023-07-21 10:09:01] 4:'u' : 'sin(cos(((P-P)+(L/P))))' [DEBUG 2023-07-21 10:09:01] 5:'u' : 'max(cos(min(L,(0.6575))),min(min(L,L),((0.6309)+(0.5804))))' [DEBUG 2023-07-21 10:09:01] 9:'u' : '(((min(P,P)-max(L,P))*(min(L,(-0.9861))+min(L,P)))/(-0.6634))' [DEBUG 2023-07-21 10:09:01] 4:'u' : 'sin(cos(((P-P)+(L/P))))' [DEBUG 2023-07-21 10:09:01] 2:'u' : '(max(L,L)+max((-0.0879),(0.9665)))' [DEBUG 2023-07-21 10:09:01] 8:'u' : 'sin(min(max(L,(0.7915)),(P/(0.2088))))'

[DEBUG 2023-07-21 10:09:01] Initial state of the population: [DEBUG 2023-07-21 10:09:01] 0:'u' : 'cos(sin((-0.8001)))' [DEBUG 2023-07-21 10:09:01] 1:'u' : '(cos(P)/(P-P))' [DEBUG 2023-07-21 10:09:01] 2:'u' : '(max(L,L)+max((-0.0879),(0.9665)))' [DEBUG 2023-07-21 10:09:01] 3:'u' : 'sqrt(sin(min((L-P),max(P,P))))' [DEBUG 2023-07-21 10:09:01] 4:'u' : 'sin(cos(((P-P)+(L/P))))' [DEBUG 2023-07-21 10:09:01] 5:'u' : 'max(cos(min(L,(0.6575))),min(min(L,L),((0.6309)+(0.5804))))' [DEBUG 2023-07-21 10:09:01] 6:'u' : 'max(max((0.7001),P),sin(P))' [DEBUG 2023-07-21 10:09:01] 7:'u' : '(cos((0.1225))sqrt(L))' [DEBUG 2023-07-21 10:09:01] 8:'u' : 'sin(min(max(L,(0.7915)),(P/(0.2088))))' [DEBUG 2023-07-21 10:09:01] 9:'u' : '(((min(P,P)-max(L,P))(min(L,(-0.9861))+min(L,P)))/(-0.6634))'

[DEBUG 2023-07-21 10:09:01] Initial state of the offspring: [DEBUG 2023-07-21 10:09:01] 10:'u' : '(((P-P)+(L/P))/(P-P))' [DEBUG 2023-07-21 10:09:01] 11:'u' : '(sin(P)/min(P,L))' [DEBUG 2023-07-21 10:09:01] 12:'u' : 'sin(cos(min((P-P),max(P,P))))' [DEBUG 2023-07-21 10:09:01] 13:'u' : 'L' [DEBUG 2023-07-21 10:09:01] 14:'u' : 'sin(min(max(L,(0.7915)),(P/(0.2088))))'

— Reply to this email directly, view it on GitHub https://github.com/aarongarrett/inspyred/issues/27, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALRPXKAEQLWHLIGLFJGBQTXRI5DLANCNFSM6AAAAAA2SQUD6E . You are receiving this because you are subscribed to this thread.Message ID: @.***>

albertotonda commented 1 year ago

I see! However, I still think the implementation should be modified, for two reasons:

  1. When manuscripts in EAs talk about "parent population", I think they mean "whole population at the current generation, before creating offspring", rather than "only the individuals selected for reproduction". I tried to find some pseudo-code to support my claim, and the first thing I stumbled upon is from Hansen's introduction to Evolution Strategies, http://www.cmap.polytechnique.fr/~nikolaus.hansen/es-overview-2015.pdf :

image

Here, it is clear that the population P at line 7, just before selection, is the union of the old population and the new offspring. Even from another point of view, since mu refers to the size of the whole population, if only the parent individuals selected for reproduction were included in the global archive before selection, the size of the population at the end of a generation, just before selection, would not be of size mu+lambda, but rather (size of an archive with just individuals selected for reproduction)+lambda. The size of such an archive would vary with the type of variators applied (e.g. 2 individuals for each activation of crossover(s), 1 individual for each activation of a mutation).

  1. Even if the current interpretation were correct, the 'parent' list can contain multiple copies of the same individual. I don't think this is the intended behavior, as it can rapidly degrade population diversity.
markcoletti commented 1 year ago

Alberto is correct. For an ES(µ+λ) the pool of prospective parents and created offspring are poured into the same bucket and then truncation selection yields a new pool of prospective parents of size |µ|. Combining just the selected parents and offspring introduces a new source of genetic drift -- that is, parents not selected drop out of sight even if they happen to be some of the best.

This is an ES example from LEAP:

    while generation_counter.generation() < max_generation:
        offspring = pipe(parents,
                         ops.random_selection,
                         ops.clone,
                         mutate_gaussian(std=context['leap']['std'], expected_num_mutations=1),
                         ops.evaluate,
                         # create the brood
                         ops.pool(size=len(parents) * BROOD_SIZE),
                         # mu + lambda
                         ops.truncation_selection(size=len(parents),
                                                  parents=parents))
        parents = offspring

pipe() takes a data source as its first argument, in this case the parents, from which offspring are accumulated via the ops.pool() operator. Then those offspring are passed to the truncation_selection() operator where the pool of prospective parents, including those that were not selected by happenstance for creating offspring, are combined and truncated by fitness; this way we do not lose the best parents (unless they happen to all be worse than the offspring). Converting this snippet to an ES(µ,λ) would simply entail dropping the parents=parents argument to the truncation selection operator.

aarongarrett commented 1 year ago

OK, I'm satisfied. If you want to make the pull request, I'll approve it.

AG

-- Aaron Garrett

On Fri, Jul 21, 2023 at 4:35 PM Mark Coletti @.***> wrote:

Alberto is correct. For an ES(µ+λ) the pool of perspective parents and created offspring are poured into the same bucket and then truncation selection yields a new pool of prospective parents of size |µ|. Combining just the selected parents and offspring introduces a new source of genetic drift -- that is, parents not selected drop out of sight even if they happen to be some of the best.

This is an ES example from LEAP:

while generation_counter.generation() < max_generation:
    offspring = pipe(parents,
                     ops.random_selection,
                     ops.clone,
                     mutate_gaussian(std=context['leap']['std'], expected_num_mutations=1),
                     ops.evaluate,
                     # create the brood
                     ops.pool(size=len(parents) * BROOD_SIZE),
                     # mu + lambda
                     ops.truncation_selection(size=len(parents),
                                              parents=parents))
    parents = offspring

pipe() takes a data source as its first argument, in this case the parents, from which offspring are accumulated via the ops.pool() operator. Then those offspring are passed to the truncation_selection() operator where the pool of prospective parents, including those that were not selected by happenstance for creating offspring, are combined and truncated by fitness; this way we do not lose the best parents (unless they happen to all be worse than the offspring). Converting this snippet to an ES(µ,λ) would simply entail dropping the parents=parents argument to the truncation selection.

— Reply to this email directly, view it on GitHub https://github.com/aarongarrett/inspyred/issues/27#issuecomment-1646211841, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALRPXNGLV77MZWUXOJ7TTLXRLRZ7ANCNFSM6AAAAAA2SQUD6E . You are receiving this because you commented.Message ID: @.***>

sanjayankur31 commented 8 months ago

Fixed in #28