GMUEClab / ecj

ECJ Evolutionary Computation Toolkit
http://cs.gmu.edu/~eclab/projects/ecj/
123 stars 42 forks source link

Missing crucial preselection step for lexicase selection #58

Open erp12 opened 5 years ago

erp12 commented 5 years ago

We haven't published it much, but there is an optimization of lexicase selection that avoids the worst case runtime without impacting selection which I can't find in the source code (although I might just be missing it). Without this optimization, lexicase can sometimes run orders of magnitude slower.

If multiple individuals have the same behavior, lexicase will require considering all training cases and will eventually randomly select between these individuals. The optimization is to filter the initial candidates down by randomly selecting one individual for each behavior up front.

For reference, it is included in the pyshgp implementation.

thelmuth commented 5 years ago

An important note is that this step needs to be done before every selection, and cannot be performed only once per generation. If you only did it once per generation, then only one individual per error vector could be selected. If you do it for every selection, you get the exact same results as if you hadn't done it, but get significant speedups.