Closed erupturatis closed 1 year ago
Thanks for the patch! I'm going to make an alternate, more extensive, change so the entire checkpoint/restore process works more consistently.
Basically the checkpoint will be done on the post_evaluate report call, after the genomes have been evaluated, but before the next round of individuals have been generated. There will be documentation and examples specifically to show what's in a given checkpoint file, and how you should load and continue running it.
Related to #132
When saving a checkpoint for a generation let's say n , we are not actually saving the last trained generation, we are saving the newly created generation based on generation n so therefore the generation n+1.
self.population = self.reproduction.reproduce(self.config, self.species, self.config.pop_size, self.generation)
is beforeself.reporters.end_generation(self.config, self.population, self.species)
In the current implementation, when loading a checkpoint, it loads it as the n-th generation and overwrites the checkpoint - n even thought it is actually the n+1 generation that's named n. This could cause some loss of generations if we load the checkpoint many times and also some confusion as stated in the issue above. I added the fix that was discussed there