daffidwilde / edo

A library for generating artificial datasets through genetic evolution.
https://doi.org/10.1007/s10489-019-01592-4
MIT License
13 stars 0 forks source link

Fitness is not being recorded properly #22

Closed daffidwilde closed 5 years ago

daffidwilde commented 6 years ago

In run_algorithm fitnesses are being calculated properly but in all cases bar the last iteration, only best_prop of the fitnesses are being appended to the history of all fitnesses. Likewise with the populations.

As I'm writing this, I think I know why. The best proportion of individuals are selected and taken from a population using population.pop(best) as required. I hadn't accounted for the fact that this would change things globally resulting in the already appended population/scores being edited as well. I assume something about pointers?

Anyway, I will think of a work around on this one... but here is an example of the kind of behaviour I'm experiencing.

Using the fitness-recording-issue branch and its added print statements:

>>> import numpy as np
>>> import genetic_data as gd

>>> def x_squared(df):
...     return df.iloc[0, 0] ** 2

>>> pop, fit, all_pops, all_fits = gd.run_algorithm(
...     fitness=x_squared,
...     size=100,
...     row_limits=[1, 1],
...     col_limits=[1, 1],
...     max_iter=5,
...     best_prop=0.5,
...     maximise=False,
...     seed=0
... )
the best score in this iteration is 0.02471670450862218
the number of scores is 100 

the best score in this iteration is 0.02471670450862218
the number of scores is 100 

the best score in this iteration is 0.02471670450862218
the number of scores is 100 

the best score in this iteration is 0.02471670450862218
the number of scores is 100 

the best score in this iteration is 0.02471670450862218
the number of scores is 100 

the best score in this iteration is 0.02471670450862218
the number of scores is 100 

the best scores: 
 [28.57371438419073, 10.13474506798552, 2.0926246987460893, 0.273387549544826, 0.14578571748701574, 0.02471670450862218]

Which certainly does not make any sense. Then to double check everything:

>>> for scores in all_fits:
...     print('the best', np.min(scores))
...     print(len(scores), '\n')
the best 28.57371438419073
50 

the best 10.13474506798552
50 

the best 2.0926246987460893
50 

the best 0.273387549544826
50 

the best 0.14578571748701574
50 

the best 0.02471670450862218
100
daffidwilde commented 6 years ago

Feel free to play around with the best_prop and lucky_prop parameters and you will see the change in the recorded fitness scores/populations.

It makes sense that the last iteration is correct of course since nothing is popped from it.

A simple solution would be to take a copy of each population and its fitness scores during the selection process. I think I can just do that with copy.deepcopy. Any other suggestions are welcome as always.

daffidwilde commented 6 years ago

Implemented this fix in #23.

daffidwilde commented 6 years ago

There may be a better way of fixing this or it could be evaded in refactoring down the line. Reopening.