CodeReclaimers / neat-python

Python implementation of the NEAT neuroevolution algorithm
BSD 3-Clause "New" or "Revised" License
1.39k stars 486 forks source link

Many dangling nodes without a connection to an output are created / left -> network breaks the longer you run it #250

Open markste-in opened 1 year ago

markste-in commented 1 year ago

Describe the bug Currently I am running NEAT for a long time and inspect the growth of the network from time to time. I started to notice that NEAT creates many "dead ends" / "dangling nodes" -> Nodes that are not going anywhere and are not connected to another node or an output. Yesterday I found an example with quite a few in them.

I am unsure if that happened with older versions before (or to this extend). Maybe that behavior is desired? I would have expected, that "dangling" nodes are either removed too when the connected node(to the exit/next leading node) is removed or that the node is re-connected to the node that comes after the removed one.

There are also nodes that are "going out" of an output that are not recurrent?! I don't think that should happen either?

The longer you run the algorithm the more "crude" and broken the network gets, up to a point when you have almost no "functional" nodes anymore because they are not connected to an output anymore

To Reproduce I used:

OS:

Linux Ubuntu-2204-jammy-amd64-base 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Expected behavior No "dangling" nodes and no nodes that "leave" an exit (maybe except of recurrent ones) or When nodes get removed and would leave a dangling node behind then they should either be removed too or be connected to the nodes that come after the now removed node

Screenshots with problematic nodes

Screenshot 2022-08-16 at 21 20 50 Screenshot 2022-08-16 at 21 36 45 Screenshot 2022-08-16 at 21 40 31

Used Config

[NEAT]
pop_size              = 300
fitness_criterion     = max
fitness_threshold     = 1000.0
reset_on_extinction   = 0
no_fitness_termination = True

[DefaultGenome]
num_inputs              = 128
num_hidden              = 0
num_outputs             = 18
initial_connection      = unconnected
feed_forward            = True
compatibility_disjoint_coefficient = 1.0
compatibility_weight_coefficient   = 1.0
conn_add_prob           = 0.35
conn_delete_prob        = 0.25
node_add_prob           = 0.35
node_delete_prob        = 0.25
activation_default      = random
activation_options      = clamped relu sigmoid sin tanh
activation_mutate_rate  = 0.05
aggregation_default     = sum
aggregation_options     = sum min max mean
aggregation_mutate_rate = 0.15
bias_init_type          = gaussian
bias_init_mean          = 0.0
bias_init_stdev         = 1.0
bias_replace_rate       = 0.15
bias_mutate_rate        = 0.8
bias_mutate_power       = 0.4
bias_max_value          = 30.0
bias_min_value          = -30.0
response_init_mean      = 1.0
response_init_stdev     = 0.0
response_replace_rate   = 0.15
response_mutate_rate    = 0.15
response_mutate_power   = 0.15
response_max_value      = 30.0
response_min_value      = -30.0

weight_init_type        = gaussian
weight_max_value        = 30
weight_min_value        = -30
weight_init_mean        = 0.0
weight_init_stdev       = 1.0
weight_mutate_rate      = 0.8
weight_replace_rate     = 0.02
weight_mutate_power     = 0.4
enabled_default         = True
enabled_mutate_rate     = 0.01

single_structural_mutation = false
structural_mutation_surer = default
response_init_type = gaussian
enabled_rate_to_true_add = 0.0
enabled_rate_to_false_add = 0.0

[DefaultSpeciesSet]
compatibility_threshold = 5

[DefaultStagnation]
species_fitness_func = mean
max_stagnation       = 50
species_elitism      = 4

[DefaultReproduction]
elitism            = 2
survival_threshold = 0.2
min_species_size = 50
jtoleary commented 1 year ago

I have the exact same problem!

valpaz commented 1 year ago

Same problem !

nexon33 commented 1 year ago

This seems like a feature, I didn't make this library but read about the mechanisms used for evolution. Sometimes neurons get isolated or replaced at random, this includes disabling a connection that was there before, and which could also get reconnected at random

markste-in commented 1 year ago

This seems like a feature, I didn't make this library but read about the mechanisms used for evolution. Sometimes neurons get isolated or replaced at random, this includes disabling a connection that was there before, and which could also get reconnected at random

I tried the following in the config to deactivate the disabling of connections: enabled_mutate_rate = 0.0

I still get a lot of dangling nodes and a mostly "broken" net.

markste-in commented 1 year ago

I created a fork that has a changed implementation. It will remove all dangling nodes and connection at the end of every run

If people wanna try it out: https://github.com/markste-in/neat-python/tree/remove_dangling_nodes I am eager for any feedback. So far I made good experiences building big functional networks

Screenshot 2022-09-25 at 09 57 23
ntraft commented 1 year ago

This seems like a feature, I didn't make this library but read about the mechanisms used for evolution. Sometimes neurons get isolated or replaced at random, this includes disabling a connection that was there before, and which could also get reconnected at random

I tried the following in the config to deactivate the disabling of connections: enabled_mutate_rate = 0.0

I still get a lot of dangling nodes and a mostly "broken" net.

Even without "disabled" (but existing) edges, you could still get lots of dangling nodes due to deleting edges, right? And you probably wouldn't want to disable the deleting of edges entirely, so that brings us back to where we started.

I don't know whether other NEAT implementations prevent this. It seems a bit ambiguous as to whether this is desirable or not; maybe those nodes could be rewired to be useful in a future generation. On the other hand, you're right that it adds more and more bloat as the algorithm progresses! Even if we're able to prune these away when instantiating the network, we are still wasting lots of time mutating them when they'll have no impact on the final output.

Btw, have you tried DefaultGenome.get_pruned_copy()? Doesn't that eliminate the dangling output nodes? (Though it doesn't remove the dangling input nodes, since they actually are used.)

Finebouche commented 1 month ago

Hey wanted to react to that, I don't think dandling nodes will add more bloat as the algorithm progresses as :

Finebouche commented 1 month ago

Ok changed my mind. The problem with dandling nodes will be that genome modification can still only impact those dandling nodes and the algorithm can get stuck because most contribution to the genome append to those unconnected parts

ntraft commented 1 month ago

I've recently realized that this has long been a topic of debate in evolution more broadly: is "neutrality" beneficial or harmful? Is it an important feature of evolvability? (I.e., modifications to the genome which are neutral—they have no effect on selection.) Arguably, it may be useful to have these in the background if they could become useful later. A background store of genetic diversity. But obviously they can also be harmful because they provide no gradient toward improvement.

Finebouche commented 1 month ago

From my experience anyway, it doesn't seem really useful. I always end up with so many disconnected parts. It really seems hard for the networks and weights evolution to make cense of those big chunks of nodes (or parts of them) when they happen to get connected.

markste-in commented 1 month ago

Yeah exactly. In most of cases that I tested the algorithm got stuck because most of its "brain" was "dead" (dangling nodes without any impact). Further evolutions just made it worse and it never recovered. I thought about changing my implementation from "directly" removing dangling nodes to "removing it after a few evolution" to simulate a degeneration of dead cells.

markste-in commented 1 month ago

I forked the project and in the branch "remove_dangling_nodes" I now have the option trim_dangling_after_n_generations_wo_improvment in the config file.

If you set it to anything greater 0 it will trim the network of the species after it made no improvement for set generations. If you set it to 0 it will always trim and if you set it to anything negative no trimming will be done.

In the code I added a trim function that I call whenever a species has not improved for n generations

I modified the openai-lander example so people see how to use it (see config).

You can find the forked repo here: https://github.com/markste-in/neat-python/tree/remove_dangling_nodes

markste-in commented 1 month ago

I also found out that sometimes the output of an network are mistakenly used as an input. I fixed that in the my branch too. issue

Finebouche commented 1 month ago

Ah alright, fixed that in this pull request as well : https://github.com/CodeReclaimers/neat-python/pull/282

markste-in commented 1 month ago

ah nice but it looks like u remove potential dangling nodes directly. I am curious if it might be helpful to keep them for a while. Like I leave them for 30 epochs and then start to trim them before they hit the stagnation limit at 40 e.g.

Another question: Were you able to solve the Lunar Lander with it? I solved it a few years ago but I can't get it to solve anymore with the current code base

Finebouche commented 1 month ago

Hi, actually I don't remove the dangling nodes from the genome, but only from the feed forward network use to do the inference. This way dandling node can still evolve but they don't damage inference time (since they are useless for inference).

And yes, I was able to solve the Lunar Lander with it 👍

markste-in commented 1 month ago

could you share your config ... i think i am off somewhere and try to figure out where

Finebouche commented 1 month ago

Hi, Sure thing ! I made a nice little interface between Gym and neat-python so that you can use any gymnasium environment easily. Check this repo : https://github.com/Finebouche/neat-gymnasium It works with my current branch of neat-python

markste-in commented 1 month ago

Thanks for sharing! I tried to use you config on the current repository but it never so solves the LunarLander. I started to troubleshoot this repo a bit more since the version in your repo works (I wanted to understand what goes wrong).

It turns out that the fitness function is totally broke. It calculates some log-loss and never converges (Not sure what the inital intention of author was since it is not documented). I started to reverse the code and implemented a "pure" fitness oriented approach. Now I am able to solve the environement too.

I will fix the demo and propose a PR so people have a working demo when they find this repo.

Finebouche commented 1 month ago

Ah yes the lunar example is totally broken.

allinduetime commented 1 month ago

Working on messing with this stuff today, is mark’s commit still the best to handle all the issues or?

markste-in commented 1 month ago

The main repo here is still broken

but

There are two working solutions rn: A working fork of the current repository https://github.com/markste-in/neat-python/tree/remove_dangling_nodes

Then there is a good example from @Finebouche https://github.com/Finebouche/neat-gymnasium It needs the neat package installed and has a working lunarlander example.

You should checkout both. Maybe we get the best of both merged into here soon.

allinduetime commented 1 month ago

Fair, and when also considering #255, what’s the difference in comparison to finebouche’s #282 fix?

allinduetime commented 1 month ago

…man I am not responding from email again without yeeting the rest

markste-in commented 1 month ago

https://github.com/CodeReclaimers/neat-python/pull/282 and https://github.com/markste-in/neat-python/tree/remove_dangling_nodes are both fixing the issues mentioned in https://github.com/CodeReclaimers/neat-python/issues/255

allinduetime commented 1 month ago

Alright, thank you, I’m gonna go figure out how to install from GitHub now lol