ShuhuaGao / geppy

A framework for gene expression programming (an evolutionary algorithm) in Python
https://geppy.readthedocs.io/en/latest/
GNU Lesser General Public License v3.0
207 stars 76 forks source link

Any settings or solutions for 'De-duplication' in the Hall of Fame? #35

Closed RobinWenqian closed 3 years ago

RobinWenqian commented 3 years ago

Thanks for you to develop this great work first. There is one problem of my project. Assume that I set the size of hof to be 100 and get best 100 individuals ever round, however, many of them are duplication. There are only 10 unique ones in the hof. Is there any settings or solutions to de-duplication? Looking forward to your reply

ShuhuaGao commented 3 years ago

The HallOfFame is provided by deap, and its update method does not care about duplication. The relevant line is https://github.com/ShuhuaGao/geppy/blob/e9a6459e13694baf738ebf4b555ab3cf48d03960/geppy/algorithms/basic.py#L105

You can customize the behaviour in two ways:

  1. Provide a subclass of HallOfFame and modify its update method to check duplication
  2. Or more easily, replace the above update line in gep_simple with two lines:
    • Check duplication
    • Call insert if not duplicated.
RobinWenqian commented 3 years ago

Thks for your quick reply. Or is there any way to de-duplicate at population generate stage? Say I will maintain a set to save all equations I have calculated and delete individuals I have calculated saved in the set, after each population generation round.

ShuhuaGao commented 3 years ago

Well, since evolution is random, I don't think there are many duplications in the generated individuals, right? Besides, sometimes these replicate individuals may be helpful to guide the direction of search. Besides, when evolution converges, it is inevitable that there are many similar individuals in the population. Overall, I think there is no need to remove duplicated individuals in the population.

In the hall of fame, we may want multiple solutions that share the best fitness. That's why removing duplication there is meaningful. Note that the operation of HallOfFame does not interfere with the main evolution.

If your objective is to maintain a diverse population to avoid early premature convergence, the academic keywords are "diversity maintenance" in evolutionary computation. You may search the literature. There are many methods.

RobinWenqian commented 3 years ago

Really appreciate for your advice, I will search for methods as you advice.