basvanwesting / genetic-algorithm

A genetic algorithm implementation for Rust
Other
16 stars 0 forks source link

Continuous alleles should allow for any num type #3

Closed mmastrac closed 2 weeks ago

mmastrac commented 1 month ago

I've been using this library to tune constants for a connection pool and it's been quite useful. One small nitpick is that there's no concept of a continuous-but-discrete allele for integral values that may be useful in places like this.

Ideally ContinuousGenotype would take a num-like parameter like so:

ContinuousGenotype::<f32>::builder() ...
ContinuousGenotype::<usize>::builder() ... 
ContinuousGenotype::<u8>::builder() ... 
basvanwesting commented 1 month ago

I'll look into this. I see a DiscreteGenotype with a fully enumerated range of possible values would probably not work well in this case, as the mutations would jump all over the place.

mmastrac commented 1 month ago

Using a discrete allele was my first attempt and it took a moment to understand why I wasn't getting the right behaviour! It could also be possible to have a mutation distance there, I suppose.

My workaround for now is using floats and an explicit casting step, but there's probably a lot of extra work happening with genes that round to the same value.

Apologies for so many reports -- I found some time tonight to collect my thoughts after using it this last few weeks. The library's API is excellent and the code is very understandable.

basvanwesting commented 1 month ago

No problem. It is great to get some feedback. I think your suggestion of a mutation distance would work best. I'll look into it.

basvanwesting commented 1 month ago

I'm thinking; a mutation distance is basically the same as the concept of a neighbouring allele (which is implemented already in the hill-climb strategy). So you basically end up using a hill-climbing strategy instead of an evolution strategy. The initial population will behave like an evolution strategy, but it will quickly collapse to a hill-climbing strategy once the population becomes relatively uniform. Hill climbing is much faster than evolution, but much more likely to end up in a local maximum/minimum. Have you tried using the hill-climbing strategy for your use case?

See: HillClimb

Maybe there is still added value to implementing this as a mutation type, creating a hybrid evolution/hill-climb strategy. The initial evolution strategy would give a better spread of the solution space and you are less likely to end up in a local maximum/minimum and the hill-climb stage at the end would zoom in efficiently to the local solution.

What do you think of this?

basvanwesting commented 1 month ago

Using a discrete allele was my first attempt and it took a moment to understand why I wasn't getting the right behaviour! It could also be possible to have a mutation distance there, I suppose.

I looked into the implementation and both Discrete and Continues mutate completely randomly within respectively the set or range. So I don't immediately see why the discrete approach would behave worse than the "continues bin" approach. Can you elaborate a bit on your findings here as well?

mmastrac commented 1 month ago

Oh! -- maybe this isn't working as expected. I was expecting that the neighbor ranges of the continuous allele were used as a maximum mutation distance, rather than the set of items chosen.

This might explain some of the results I was seeing.

My expectation from the API would be that a mutation was a distance, rather than an absolute sample. I think I must have been imagining the improvement because I can clearly see how it's not working that way :).

I think that allowing for a mutation-distance-distribution could allow for the best of both worlds -- shorter hops would work like hill-climbing, longer hops more like major mutation events.

I can ponder this a bit. In my case I've got islands of stability, so optimization needs to alternate between randomly throwing itself around the solution space and hill-climbing. It might be worth using evolve to generate candidates for hill-climbing.

basvanwesting commented 1 month ago

I will look at adding ContinuousGenotype::\<u8> (and friends as generic) and allow the concept of distance there. In addition I think we should maybe rename DiscreteGenotype to EnumGenotype (to express there is no distance, so distance would behave as random).

I will rename the MutateOnce to MutateSingleAlleleRandom (or something like that). And align MutateTwice to MutateMultiAlleleRandom with a parameter how many. When this is done we can add MutateSingleAlleleDistance with a distance parameter (some uniform max distance of the type). This way random and distance are neatly separated.

@mmastrac do you think this would be the right way to go?

mmastrac commented 1 month ago

This would be fantastic! I should try creating a custom mutation for my use case to test a few of these ideas out, but I'm pretty sure that MutateSingleAlleleDistance would give me what I need.

basvanwesting commented 1 month ago

Also your requested Regular reporting callbacks during evolution #4 would probably by quite useful for seeing how the path to the solution behaves.

basvanwesting commented 1 month ago

@mmastrac I added MutateSingleGeneDistance, only usable by ContinuousGenotype for now. It takes a allele_distance_range which bounds the mutation (instead of the full allele_range in MutateOnce). It is ~5x more efficient in the example/evolve_continuous (but that is a very simple example with a single very directed solution space). You can try this one out. Can you use main, or do you need a published version?

Adding ContinuousGenotype::\<u8> (and friends as generic) is quite complicated, so please stick with ContinuousGenotype with implicit f32 for now. With the proper ranges and mutation distance, this should not be very inefficient with regards to a u8 implementation.

I guess Regular reporting callbacks during evolution #4 should be next for feedback and testing purposes. As I still don't quite understand why Discrete and Continuous genotypes had such a different performance in your use case.

basvanwesting commented 1 month ago

There was an error in the implementation. With fix it is only ~4x more efficient in example/evolve_continuous.

basvanwesting commented 2 weeks ago

Released v0.9.0 which implements any num type. See CHANGELOG for the many changes which resulted from this generalization.

Notable changes for issue context:

mmastrac commented 2 weeks ago

Can't wait to try it out -- hopefully soon! Thanks for all the awesome work on this library.