New feature: Metaheuristics

jsboige commented 3 years ago

This is a very large pull request that introduces a new powerful feature to GeneticSharp: Metaheuristics.

It comes with numerous fixes, tweaks and improvements, which are not necessarily related, so in order to make it easier to understand, I will also make 2 other smaller pull-requests that this one contains, related to issue #82. Hopefully we can validate those preparatory intermediate pull-requests before this one is entirely validated. Those pull-requests are #89 and #88.

Note that there isn't much code coverage yet, comments are lacking too, and there is probably some more work to do before it can be fully integrated in master branch, thus my proposal to merge it in develop first. However, I believe it is clean enough for an initial review, and there are important heavy unit tests passing that should clear the way. This is also that I will have to postpone working on it for about a week, and I figured now that the upgrade has started yielding very promising results, you might be willing to give it a try a give me your feedback.

Now about it: Metaheuristics are a new kind meta-operators that will allow controling the evolution process through other native operators dynamically. In the last 20 years, a lot of succesful ones were introduced, often based on natural metaphors. They manage to better guide evolution through finely tuned exploration/exploitation tradeoff, and prevent early collapse to a local optimum, which standard GAs and GeneticSharp exhibits (see issue #47 for a discussion about it).

In order to integrate known Metaheuristics and corresponding capabilities, I believe the mealpy repository is going to be a reference, since it implements pretty much all of them.

For now, I have set to myself the objective to have one well known good Metaheuristic working, namely Whale Optimisation Algorithm (WOA), for which I strictly followed the easier original Matlab version. The initial result can be seen in a factory class where I intend to implement add other famous Metaheuristics when initial integration is cleared. In order to make using Metaheuristcs easier, I have started providing the corresponding fluent API that is demonstrated through that initial example, and will be used throughout the other factory examples.

Another distinct Metaheuristic, which I also implemented and tested is the Eukaryote MetaHeuristic. It is of a different kind, because it leverages the available dynamical operator mechanism at the chromosome level rather than the evolution operators themselves as with other Metaheuristics. My work on that was largely inspired by the following publication introducing geometric crossover for Sudokus. The corresponding metaheuristics allows spliting a regular 81 cells chromosome into a 9 rows subchromosome karyotypes, and then using on each of them ordered crossovers and mutations preserving permutations. This is illustrated both in the Unit tests and in the updated gtk# application controller, though on this one you have to select the crossovers and mutation manually using the Gui. The new Eurkaryote-based chromosome outperforms the existing Cells chromosome and prevents collapse to suboptimal solutions in certain cases, which was the intended behaviour. Note that it is still inferior to the recently improved existing PermutationChromosome, which has a reduced search space and doesn't need the chromosome splitting overhead.

Those two initial test-cases guided me designing all the capabilities needed for their implementation, and should provide robust examples on how to leverage the new capabilities: there is now a vast range of options available through the combination of Metaheuristics building blocks.

There is a lot of code-coverage to add now, together probably with a couple bugs to fix, and I will be willing to do most of the clean-up, though of course if you appreciate the contribution, I don't mind doing that work collaboratively.

About modifications to the existing code-base, I tried to keep the footprint relatively light. I encourage you to compare benchmarks with other branches. There shouldn't be too much impact. That is, of course, when using the default Pass-through Metaheuristic. When using a relatively heavy Metaheuristc with sub-components like the WOA, the overhead is more severe (more than 2X), but it is largely compensated by the fact that the search reaches much better fitness that the original GA wasn't capable of reaching with the suboptimal modal collapse aforementioned, and it does so with populations an order of magnitude smaller (I encourage you to analyse the initial compund unit test: the Metaheuristic based vastly outperforms the regular GA). Now it doesn't mean there isn't room for improvements . I believe the parameter passing mechanism, though functional, might be interesting to refactor both for speed and fluent usage.

Some of the other technical discussion will probably concern the other 2 pull-requests, which are preliminaries to this one, so I'll leave it at here now, with that already long description.

I hope this doesn't look too frightening, that you'll enjoy the contribution, and I'm definitely up for discussing what needs discussed.

Cheers

jsboige commented 3 years ago

I can see the continuous control build failed because of c# language features. I guess this is because I'm using a more recent version of VS community. As I mentioned, I won't have time in the coming days to get back to it, but I suppose this is minor adjustments.

In the mean time, I think it would be a good idea to upgrade a few versions in the project, which would probably yield that build to pass. Most notably, I played a bit with your Benchmark.Net project, and the latest Benchmark.Net version has breaking changes with your code, but I'd be willing to do the updates.

giacomelli commented 3 years ago

No problem to use the newer C# version, we are using 7.0 and you are probably using 8.0.

jsboige commented 3 years ago

OK it seems I was a bit overenthusiastic with the initial results.

Apparently the OnePointChromosome was having difficulties with the extended-size StubChromosome, making it look like the WhaleOptimizer was performing particularly well, but actually the UniformCrossover is killing it with this task, overperforming the WhaleOptimizer with much better performances.

This is not too unexpected though, since that chromosome's fitness landscape is pretty smooth without any local maxima, so it makes sense with enough genetic mixing, a standard crossover would suffice spreading the population and finding the maxima at the edges, whereas the ellicoidal search pattern of the WhaleOptimizer doesn't bring much to the table.

This brings 2 suggestions:

Performance is going to be key. I just profiled the new evolution unit test, and a lot of the time is spent dealing with dynamic parameters and type conversions from the WOA fluent definition. This probably isn't entirely necessary, but there is going to be a tradeoff between usability and performances.
I'm up for adding a new GTK# controller that will be responsible with mapping populations on a landscape map. Together with well known benchmark functions and maybe 2D projections on a per-chromosome basis, this should help us better figure out what happens in terms of diversification, convergence, collapse etc., and compare the various strategies available.

Anyway, I'm gone for good now, and for a couple days. I hope the last commits to your comments were satisfactory, and that your code review is going smoothly.

jsboige commented 3 years ago

I was able to squeeze another couple hours since I was a bit frustrated with the latest results. I fixed a important bug with match randomization, and started reworking the Fluent API/parameter passing mechanism with Lambda Expression visitors. It's not functional yet, but I believe the hardest is behind.

ktnr commented 3 years ago

I very much like this library and I appreciate that there is effort to expand its capabilities and improve it. Thank you @jsboige for putting a lot of work in. Please do not take the following personally.

I would like to add a perspective from what I perceive to be the general consesus of the operations research community.

There are classical, well-established, tried and tested metaheuristics: Tabu-Search, Iterated Local Search, Greedy Algorithms, Simulated-Annealing, Evolutionary Algorithms, Genetic Algorithms, Ant-Colony-Optimization, and a few others.

In recent years there has been an explosion of "new" metaheuristics, including: Whale-Optimization, Eukaryote Metaheuristic, Grey Wolf Optimization, and much much more. Those "new" metaheuristics are ~~almost~~ universally frowned upon, especially from reputable researchers. Be sure to check out Sörensen, Kenneth. "Metaheuristics—the metaphor exposed." International Transactions in Operational Research 22, no. 1 (2015): 3-18. I suggest to read the Abstract and the Conclusion, and then dive in.

The argument is that the "new" metaheuristics are not, in fact, new. They just reheat and repackage concepts, ideas, operators, subroutines from classical metaheuristics under a new name. While among all those repackaged ideas there is some "truly innovative research of high quality" (Sörensen, 2015), it is hard to filter them out from all the noise, and hard to ascribe credibility to those new ideas. In rare exceptions, some of the few good, innovative ideas can also fit nicely into existing, classical frameworks, and improve them. But should not be sold as new under a different name.

Why did/does this happen? In short: easy publications - an author can claim that it is "new".

More options, and fancier algorithms, do not necessarily mean that they also perform better, e.g. demonstrated in Ruiz, Stützle 2007. Also, a plethora of options (and possibly multiple options with different names that ultimately lead to the same outcome) does not make it easier to use a certain algorithm/framework.

I am not a maintainer of this library, so I have no say in what gets merged and what does not. It comes down to what @giacomelli thinks. I just wanted to put this perspective out there, as I fear that starting to accept additions like this could ultimate deteriorate the quality of this library (and create a lot of reviewing and maintenance work).

I do have some questions and suggestions:

Regarding

introduces a new powerful feature to GeneticSharp: Metaheuristics.

A genetic algorithm is already deemed a metaheuristic. This makes it hard to know what the purpose of your proposed Whale Optimization and Eukaryot Metaheuristic really are, which use and inherit from the GA base classes. Is it guiding the search and "tuning" the GA's search parameters? It's hard to grasp from 4000 new lines of code. Maybe you could clarify.
Can you clarify what exactly differentiates your proposed algorithms from classical ones?
What are the similarities to classical algorithms?
Do you see a pathway to integrate your contributions into this classical framework (Genetic Algorithms), maybe by letting the naming scheme reflect this. This requires to work out points 1-3.
If people want to extend GeneticSharp with "over-arching" (?) frameworks like the ones proposed, would it make sense to host those extensions in a different library? Maybe one that uses GeneticSharp as a submodule. People could then build all kinds of new frameworks/extensions building upon the stable core components of GeneticSharp. Maybe similar to mealpy you mentioned. For insipration, one could look into the plug-in structure that Heuristic Lab utilizes.

jsboige commented 3 years ago

After some more cleanup, the last set of unit cases I believe now better achieve the original goal, and discussion should be easier.

Work on the fluent API yielded 2 alternate ways to define lambdas with sub parameters. This is demonstrated with 2 implementations of the WOA. The "reduced" version with implicit additional parameters should be faster by merging Lambda expressions.

UniformCrossover, though more efficient on small stub chromosomes, is now outperformed with larger chromosomes.

I very much like this library and I appreciate that there is effort to expand its capabilities and improve it. Thank you @jsboige for putting a lot of work in. Please do not take the following personally.

Thanks, and no problem, your input is welcome. I may disagree on some of your claims, but I get the overall feeling, I think your concern is justified, and I'm trying to bring some answers.

In recent years there has been an explosion of "new" metaheuristics, including: Whale-Optimization, Eukaryote Metaheuristic, Grey Wolf Optimization, and much much more(...).

I believe we can summarize the general idea behind your references that Metaheuristics should get a component-based approach rather than the bestiary of new metaphores. This is the approached I followed here. You can have a look at the factory where some of the bestiary names will end up. It's really all about the primitives.

Regarding A genetic algorithm is already deemed a metaheuristic. This makes it hard to know what the purpose of your proposed Whale Optimization and Eukaryot Metaheuristic really are, which use and inherit from the GA base classes. Is it guiding the search and "tuning" the GA's search parameters? It's hard to grasp from 4000 new lines of code. Maybe you could clarify.

In the proposed addition MetaHeuristics don't replace the GA base classes, rather add a set of primitive operators allowing to control parametrise the GA search dynamically. Whale Optimization and Eurkaryote are not of the same kind. The former is a compound heuristic defined with an operator tree, the later is a primitive that allows to split chromosomes into custom karyotypes to be chained with other sub-operators.

Can you clarify what exactly differentiates your proposed algorithms from classical ones?

There is not really a "proposed algorithm" appart from the examples that will help build the feature, rather the building blocks to implement many kinds of algorithms/named heuristics. It aims at the "chemistry rather than the alchemy" from one of your quote.

What are the similarities to classical algorithms?

See previous anwser

Do you see a pathway to integrate your contributions into this classical framework (Genetic Algorithms), maybe by letting the naming scheme reflect this. This requires to work out points 1-3.

Well I believe I worked out most difficulties. The surface contact to the existing implementation is mostly limited to the main GA routine with linear and parallel executors.

If people want to extend GeneticSharp with "over-arching" (?) frameworks like the ones proposed, would it make sense to host those extensions in a different library? Maybe one that uses GeneticSharp as a submodule. People could then build all kinds of new frameworks/extensions building upon the stable core components of GeneticSharp. Maybe similar to mealpy you mentioned. For insipration, one could look into the plug-in structure that Heuristic Lab utilizes.

Mealpy should definitely be a reference (as a "result-oriented bestiary"), and Heuristic Lab has some interesting components, but I found it harder to work with, maybe precisely because the hyper-parametrisation somehow gets in the way, whereas I tried to keeps things tighter here, to extend the clear architecture of GeneticSharp.

jsboige commented 3 years ago

I added more conclusive tests with known difficult functions geometric fitness functions. Unless I missed something, it seems the current original WOA implementation beats Uniform crossover consistently now.

giacomelli commented 3 years ago

I think we need to take into account the points raised by @ktnr, because when I created GeneticSharp the idea was to have the classic structure of a genetic algorithm using good software engineering and thus keeping it as simple as possible.

I really like the contributions of these last pull requests, but I still don't have the knowledge to evaluate them when it comes to their assertiveness, I was never an artificial intelligence scientist, I am just an experienced programmer who has a basic knowledge of genetic algorithms. That said, I will have a lot of difficulty doing this code review, not in relation to the code, but as to what it does.

Perhaps the idea of having a child project, which could be led by @jsboige, would be a better idea and that would evolve with more fluidity and speed.

My desire for GeneticSharp is to keep it as a simple framework for the development of genetic algorithms in C# and that allows new projects to use it to perform more complex tasks. In the state that the project is currently in, the idea would be to just accept pull-requests for improvements to existing features and bug fixes.

What do you think?

jsboige commented 3 years ago

Here's a new attempt at finalizing this pull-request.

The Metaheuristic engine was isolated into a specific inherited GA class such that the original one is left nearly untouched, and the native workflow is restored to its previous state. I added significant code coverage, documentation and more clean-up/refactoring, that should make the pull request easier to understand. Also, part of those unit tests are helpers and grid search stubs, which should help navigate the new available parameters. This is accordingly a work in progress. Some components were tightened, some others were extended. I invested some time on the TSP problem, because geometric crossovers as the WOA, which work well on benchmark functions, should have more problems on TSP without the proper geometric embedding. I believe there is research material available in that field, and I laid out the direction.

So, coming back to your question. I would really appreciate if you can make this into Genetic Sharp. I understand I could try to get some traction with a fork, but I believe this is going to be a pain down the road for both of us.

I really pushed hard to make this whole extension quite tight, and I have set many bounds in the unit test that should allow us to monitor that. I know this library has more of an engineering touch. I believe I respected the general spirit of the code, and hopefully I can get you onboard with the new features: I'm sure your skills will do wonder to complete my additions.

Now that I believe this is relatively stable, I would greatly appreciate some feedback. Also Let me know if you have any question. Thanks in advance for giving a second look at my last commits.

In any case, would you consider doing the language feature upgrades we mentioned, so that the continuous integration gets pas t the build stage?

giacomelli commented 3 years ago

Okay, I really don't know what the best solution is (inside GeneticSharp or in a children's project), but as your contribution is something additional and does not interfere with the basic workflow, I believe it is safe to continue.

Regarding the code review, I will not be able to do the review in the next two weeks.

About error error CS8107: The 'standard literal' feature is not available in C # 7.0. Use language version 7.1 or higher. You need to add the element below to the first PropertyGroup of the .csproj where the error occurs:

<LangVersion> 7.1 </LangVersion>

jsboige commented 3 years ago

Okay, I really don't know what the best solution is (inside GeneticSharp or in a children's project), but as your contribution is something additional and does not interfere with the basic workflow, I believe it is safe to continue.

Thank you

Regarding the code review, I will not be able to do the review in the next two weeks.

No worries. I won't be able to put many hours, but I'll make sure tests pass, and I'll try to do some additional coverage.

About error error CS8107: The 'standard literal' feature is not available in C # 7.0. Use language version 7.1 or higher. You need to add the element below to the first PropertyGroup of the .csproj where the error occurs:
<LangVersion> 7.1 </LangVersion>

Thanks for the tip, I was able to get going.

jsboige commented 3 years ago

After a couple fixes, I'm not sure why CI is failing now after last push. I previously had issues with some tests failing specifically on AppVeyor, but all tests seem to be passing now.

jsboige commented 3 years ago

After a couple fixes, I'm not sure why CI is failing now after last push. I previously had issues with some tests failing specifically on AppVeyor, but all tests seem to be passing now.

OK I saw there is a message about overtime build execution. I'm not sure what the cause is. The tests are supposedly longer they used to be, but nothing that bad.

jsboige commented 3 years ago

Just a quick checkpoint in case you start having a look before this is properly finalized (I hope to be finished by the end of this week) After we last exchanged, I continued the clean-up, refactoring, and provided some coverage. While fighting with appveyor to get certain tests to pass, I ended up extending the clean-up outside of the original scope.

Then, conducting experiments with the WOA heuristic, I decided to implement the visual landscape explorer I was alluding to earlier, as part of the GTK# samples, in order to make convergence analysis easier. That new sample is not finished yet largely functional now, so I guess it is safe to experiment with. Using the equation chromosome and function fitness, that sample will plot chromosome positions and allows visualizing convergence. A good benchmark of 10 known multidimensional functions was consolidated. Since the 2D case is usually not so interesting, 2 additions were planned:

Allowing to navigate arbitrarily complex landscapes provided by images. This is mostly finished. As an example, I generated 4 height maps of distinct scales around mount Everest, which is correctly identified as the max fitness point, and there is the option to load an image, which is processed in grey scale as typical Height Maps, though leveraging Hue rather than grey scale levels could be an option.
Allowing for higher dimensions (while plotting 2D for the 2 first coords), This is to be done, I don't think it will take too long.

Along the way there was some additional cleanup/refactoring for the GTK# app. In order to make tests possible, I added a pane with the option to use MetaHeuristics on top of the existing parameters. This is not functional yet, because using geometric crossovers implies problem-specific gene values conversions that need addressed. I don't think it should be too long either to implement.

Then there is the metaheuristics themselves and some more tests to be done. I started deconstructing the useful components of the WOA within the grid search tests, and plan to do precise experiments with the help of the new visualizer.

My last effort will be targeting the geometric embeddings and how to get an edge on the TSP using such methods. I don't think the current default WOA implementation is really helpful with that, because of the large swings in gene values that tend to be counter productive with TSP. I'll deal with this last.

I believe I can get most of this sorted by this WE, but if not, you've got the whole picture, and even though there is more code updates than initially planned, I believe it is in a cleaner state that it was 2 weeks ago, so the review shouldn't be necessarily harder.

I'll keep you posted with the finalizing commits.

giacomelli commented 1 year ago

I'm closing this PR because I believe it's out of the scope of GeneticSharp: to provide the basics of algorithm genetics.

Maybe this PR could be a "child" project of GeneticSharp, like "GeneticSharip-Contributions" or something like that.

Feel free to re-open it to let us further discuss this possibility.

giacomelli / GeneticSharp

New feature: Metaheuristics #87