Compatibility_Threshold vs Target_Number_Of_Species

luke-saunders commented 2 years ago

Hello All,

I have been working on a feature for maintaining a 'target_number_of_species' within a defined min/max range. It achieves this by altering the 'compatibility_threshold' each generation.

Question: Is this feature beneficial? ... or am I just causing adverse conditions for the algorithm by frequently changing the compatibility_threshold?

th555 commented 1 year ago

In my opinion it would be a useful feature to have since one wouldn't have to manually tweak the compatibility threshold to achieve a sensible number of species. And it's used in other NEAT codebases, such as here

luke-saunders commented 1 year ago

I briefly worked on this a couple years ago. It works to some extent, but more work is required.

The calculation of when to change compatibility_threshold, and by how much is not ideal.

Reducing the number of species can be tricky. One example is species stagnation - it will hold onto species for a period of generations, therefore changes to the compatibility_threshold (during this period) will not yield a predictable change to the species count.
Ramping up species is a little more predictable, but changes to the compatibility_threshold can take a generation (or more) to observe increases to the species count.
My initial implementation for handling the challenge was to wait x generations between changes to compatibility_threshold (review_frequency_inc, review_frequency_dec). It's not an ideal solution, but it does work to some degree. The main challenge is finding values that yield the best results.

Feel free to have a look. Maybe someone will be able to improve the implementation.

https://github.com/luke-saunders/neat-python/tree/TargetNumberOfSpecies

Explanation of the changed config values:

[DefaultSpeciesSet] target_number_of_species --- Target number of species compatibility_threshold_init --- Initial value for compatibility_threshold compatibility_threshold_min --- Lower limit for compatibility_threshold value compatibility_threshold_max --- Upper limit for compatibility_threshold value compatibility_modifier --- Strength of a compatibility_threshold change review_frequency_inc --- Wait x generations until an increase is applied review_frequency_dec --- Wait x generations until a decrease is applied

On Fri, 28 Jul 2023 at 22:14, th555 @.***> wrote:

In my opinion it would be a useful feature to have since one wouldn't have to manually tweak the compatibility threshold to achieve a sensible number of species. And it's used in other NEAT codebases, such as here https://github.com/CreativeMachinesLab/softbotEvolution/blob/d47e2ef0ef3ec14c68e770aa553a748512397069/base/hyperneat/HyperNEAT/NEAT/src/NEAT_GeneticPopulation.cpp#L197

— Reply to this email directly, view it on GitHub https://github.com/CodeReclaimers/neat-python/issues/246#issuecomment-1655579849, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLTFOLSQEPMSMH5XNB77QDXSOUIVANCNFSM55EH2LGA . You are receiving this because you authored the thread.Message ID: @.***>

th555 commented 1 year ago

Thanks for the details, it is (as always) more complicated than I thought :)

CodeReclaimers commented 1 year ago

I'm thinking of adding a speciation scheme that just uses k-means, so the only setting would be the number of species you want. The individuals would be grouped together into a fixed number of clusters based on genomic distance, with each cluster being a species.

Does that sound like it would do most of what you were thinking about, @luke-saunders?

luke-saunders commented 1 year ago

To be honest, I don't know enough about the speciation algorithm to make an informed decision. At the time, I had concerns about implementing changes that adversely impacted speciation, therefore put the work on hold until I had time for a deep dive.

On Mon, 31 Jul 2023 at 02:43, CodeReclaimers @.***> wrote:

I'm thinking of adding a speciation scheme that just uses k-means, so the only setting would be the number of species you want. The individuals would be grouped together into a fixed number of clusters based on genomic distance, with each cluster being a species.

Does that sound like it would do most of what you were thinking about, @luke-saunders https://github.com/luke-saunders?

— Reply to this email directly, view it on GitHub https://github.com/CodeReclaimers/neat-python/issues/246#issuecomment-1657216995, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLTFON4E2S5P47J55UOYKTXS2FLXANCNFSM55EH2LGA . You are receiving this because you were mentioned.Message ID: @.***>

Gabriel-Kissin commented 1 year ago

I'm thinking of adding a speciation scheme that just uses k-means, so the only setting would be the number of species you want. The individuals would be grouped together into a fixed number of clusters based on genomic distance, with each cluster being a species.

Sounds like a great idea. The only question is whether a hybrid approach would be an option - i.e. use some clustering algorithm to ensure a minimum number of species, and at the same time use compatibility_threshold to ensure that if there are genuinely different species of genome, that they get divided into separate species. That way you could for example specify you want a minimum of 5 species, while also allowing for more if the generated genomes warrant it

How would the clustering work? Do you have some d-dimensional space representing genomic features, in which each genome is represented? If so k-means should work, if the distances in all dimensions are of comparable importance to speciation (as kmeans makes spherical clusters). Alternatively (and this is the idea I have in my head for some reason) there is just a distance matrix representing genomic distances between all pairs of genomes - in which case there are clustering algorithms which can work on the back of such a distance matrix. Either way the concept of using clustering for speciation sounds like a good one, to protect whatever diversity there is in the population

Gabriel-Kissin commented 1 year ago

Worth pointing out that the tfne library which implements the CoDeepNEAT algorithm (which is a successor to NEAT, for deep NNs) has functionality similar to what is being discussed here. As well as fixed-distance speciation, it also offers dynamic speciation. Practically this means that you just set the number of species you want, and it figures out the distance threshold accordingly.

CodeReclaimers / neat-python

Compatibility_Threshold vs Target_Number_Of_Species #246