Closed hyang0129 closed 1 year ago
This is actually caused by a problem in the initialize population method. When running initialize population from a restore, it skips the process of setting the input shape, thus all genomes have an input shape of none and evaluate at 0 fitness because they are invalid models.
This is actually caused by a problem in the initialize population method. When running initialize population from a restore, it skips the process of setting the input shape, thus all genomes have an input shape of none and evaluate at 0 fitness because they are invalid models.
Mr Hongy did u find a solution for this?
@PaulPauls can you please tell me how to fix this? I really need this functionality as I dont have the computation resources required to run CoDeepNEAT for a long time
I've had this ZeroDivisionError in normal use, i.e. even when not restoring a population.
The docs state "If due to the random choice of modules for the blueprint graph an invalid TF model is generated from the genome genotype, the assembled genome is assigned a fitness score of 0". I guess when this happens randomly to all genomes, the total fitness is zero, which triggers the error:
File ...... in CoDeepNEATSelectionMOD._select_modules_param_distance_fixed(self)
216 for spec_id in mod_species_ordered:
217 spec_fitness = self.pop.mod_species_fitness_history[spec_id][self.pop.generation_counter]
--> 218 spec_fitness_share = spec_fitness [/](https://file+.vscode-resource.vscode-cdn.net/) total_avg_fitness
219 spec_intended_size = int(round(spec_fitness_share * available_mod_pop))
221 if len(self.pop.mod_species[spec_id]) + self.mod_spec_min_offspring > spec_intended_size:
ZeroDivisionError: division by zero
In fact looking more closely at the code in the region of the error, it seems that this error doesn't require all genomes in the population to have a fitness of zero. Actually, even if just ONE of the species has a total fitness of zero (i.e. all genomes in one species have zero fitness), then it will raise this error. To see this, look in the code in the file tfne/algorithms/codeepneat/_codeepneat_selection_mod.py
in lines 216-227. Looping through all species, the last line of the loop subtracts the species fitness from the fitness of all species. So if the species at the end of the list of species has zero fitness, the amount subtracted is equal to the original total fitness. And now, total_avg_fitness is zero - resulting in the error.
I suppose species with zero fitness can be removed prior to this operation - that should prevent this error.
Just to back up what I wrote in the previous comment, here is the output of the run which raised the error in Generation 46:
Evaluating 40 genomes in generation 45...
[========================================] 40/40 Genomes | Genome ID 1840 achieved fitness of 16.8438
############################################################ Population Summary ############################################################
Generation: 45 || Best Genome Fitness: 44.25 || Avg Blueprint Fitness: 26.6785 || Avg Module Fitness: 25.6934
Best Genome: CoDeepNEAT Genome | ID: 186 | Fitness: 44.25 | Blueprint ID: 34 | Module Species: {1, 3} | Optimizer: sgd | Origin Gen: 4
Blueprint Species || Blueprint Species Avg Fitness || Blueprint Species Size
2 || 25.2025 || 3
Best BP of Species 2 || CoDeepNEAT Blueprint | ID: #225 | Fitness: 28.5391 | Nodes: 9 | Module Species: {8, 6} | Optimizer: sgd
4 || 26.438 || 4
Best BP of Species 4 || CoDeepNEAT Blueprint | ID: #218 | Fitness: 28.8477 | Nodes: 8 | Module Species: {8, 6} | Optimizer: sgd
5 || 28.4752 || 3
Best BP of Species 5 || CoDeepNEAT Blueprint | ID: #228 | Fitness: 29.3125 | Nodes: 2 | Module Species: {6} | Optimizer: sgd
Module Species || Module Species Avg Fitness || Module Species Size
6 || 26.6094 || 9
Best Mod of Species 6 || CoDeepNEAT DENSE Module | ID: #523 | Fitness: 31.2864 | Units: 28 | Activ: tanh | Dropout: 0.4
8 || 24.9439 || 11
Best Mod of Species 8 || CoDeepNEAT DENSE Module | ID: #529 | Fitness: 32.125 | Units: 12 | Activ: tanh | Dropout: 0.4
##############################################################################################################################################
Evaluating 40 genomes in generation 46...
[========================================] 40/40 Genomes | Genome ID 1880 achieved fitness of 26.17191
############################################################ Population Summary ############################################################
Generation: 46 || Best Genome Fitness: 44.25 || Avg Blueprint Fitness: 26.7656 || Avg Module Fitness: 18.4226
Best Genome: CoDeepNEAT Genome | ID: 186 | Fitness: 44.25 | Blueprint ID: 34 | Module Species: {1, 3} | Optimizer: sgd | Origin Gen: 4
Blueprint Species || Blueprint Species Avg Fitness || Blueprint Species Size
2 || 27.3138 || 3
Best BP of Species 2 || CoDeepNEAT Blueprint | ID: #225 | Fitness: 28.9375 | Nodes: 9 | Module Species: {8, 6} | Optimizer: sgd
4 || 29.612 || 3
Best BP of Species 4 || CoDeepNEAT Blueprint | ID: #236 | Fitness: 30.5508 | Nodes: 8 | Module Species: {8, 6} | Optimizer: sgd
5 || 24.2197 || 4
Best BP of Species 5 || CoDeepNEAT Blueprint | ID: #228 | Fitness: 29.3789 | Nodes: 2 | Module Species: {6} | Optimizer: sgd
Module Species || Module Species Avg Fitness || Module Species Size
6 || 28.8587 || 10
Best Mod of Species 6 || CoDeepNEAT DENSE Module | ID: #550 | Fitness: 31.6407 | Units: 28 | Activ: tanh | Dropout: 0.4
8 || 26.6219 || 3
Best Mod of Species 8 || CoDeepNEAT DENSE Module | ID: #529 | Fitness: 28.1701 | Units: 12 | Activ: tanh | Dropout: 0.4
9 || 0 || 2
Best Mod of Species 9 || CoDeepNEAT DENSE Module | ID: #536 | Fitness: 0 | Units: 12 | Activ: relu | Dropout: 0.4
10 || 0 || 5
Best Mod of Species 10 || CoDeepNEAT DENSE Module | ID: #537 | Fitness: 0 | Units: 32 | Activ: tanh | Dropout: 0.2
##############################################################################################################################################
... and immediately after this the ZeroDivisionError was raised.
Note that in Gen45, all the species have fitnesses greater than zero. So no error. Whereas in Gen46, both MODULE species 9 and 10 have an average fitness of zero, and the best module of both species has a fitness of zero - that means that the total fitness of those species is zero - which is what caused the error, as explained.
Hope this helps someone!
Hey, please excuse my very late answer to this thread. I have not found the time to properly maintain this project, which resulted from a research project into dynamic graphs using Tensorflow stemming from 2019. I am aware of a lot of bugs in this research prototype, some of which I know about since I continued this project privately. However I never found the time to document my progress and therefore never pushed it publicly. Since I can't seem to find the time to maintain this open-source project among my other engagements have I decided to archive it.
For future work employing evolutionary compution and algorithms do I highly recommend the Google EvoJax library. It utilizes Jax as the backend ML library, which is much better suited for dynamically changing graphs than Tensorflow however wasn't around back when I started this project. Please see here:
CODEEPNEAT restoring state results in divide by zero when training on GPU.