Many dangling nodes without a connection to an output are created / left -> network breaks the longer you run it

markste-in commented 2 years ago

Describe the bug Currently I am running NEAT for a long time and inspect the growth of the network from time to time. I started to notice that NEAT creates many "dead ends" / "dangling nodes" -> Nodes that are not going anywhere and are not connected to another node or an output. Yesterday I found an example with quite a few in them.

I am unsure if that happened with older versions before (or to this extend). Maybe that behavior is desired? I would have expected, that "dangling" nodes are either removed too when the connected node(to the exit/next leading node) is removed or that the node is re-connected to the node that comes after the removed one.

There are also nodes that are "going out" of an output that are not recurrent?! I don't think that should happen either?

The longer you run the algorithm the more "crude" and broken the network gets, up to a point when you have almost no "functional" nodes anymore because they are not connected to an output anymore

To Reproduce I used:

gym 0.25.1 with 'Alien-v4'
neat 0.93
config see below

OS:

Linux Ubuntu-2204-jammy-amd64-base 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Expected behavior No "dangling" nodes and no nodes that "leave" an exit (maybe except of recurrent ones) or When nodes get removed and would leave a dangling node behind then they should either be removed too or be connected to the nodes that come after the now removed node

Screenshots with problematic nodes

Used Config

[NEAT]
pop_size              = 300
fitness_criterion     = max
fitness_threshold     = 1000.0
reset_on_extinction   = 0
no_fitness_termination = True

[DefaultGenome]
num_inputs              = 128
num_hidden              = 0
num_outputs             = 18
initial_connection      = unconnected
feed_forward            = True
compatibility_disjoint_coefficient = 1.0
compatibility_weight_coefficient   = 1.0
conn_add_prob           = 0.35
conn_delete_prob        = 0.25
node_add_prob           = 0.35
node_delete_prob        = 0.25
activation_default      = random
activation_options      = clamped relu sigmoid sin tanh
activation_mutate_rate  = 0.05
aggregation_default     = sum
aggregation_options     = sum min max mean
aggregation_mutate_rate = 0.15
bias_init_type          = gaussian
bias_init_mean          = 0.0
bias_init_stdev         = 1.0
bias_replace_rate       = 0.15
bias_mutate_rate        = 0.8
bias_mutate_power       = 0.4
bias_max_value          = 30.0
bias_min_value          = -30.0
response_init_mean      = 1.0
response_init_stdev     = 0.0
response_replace_rate   = 0.15
response_mutate_rate    = 0.15
response_mutate_power   = 0.15
response_max_value      = 30.0
response_min_value      = -30.0

weight_init_type        = gaussian
weight_max_value        = 30
weight_min_value        = -30
weight_init_mean        = 0.0
weight_init_stdev       = 1.0
weight_mutate_rate      = 0.8
weight_replace_rate     = 0.02
weight_mutate_power     = 0.4
enabled_default         = True
enabled_mutate_rate     = 0.01

single_structural_mutation = false
structural_mutation_surer = default
response_init_type = gaussian
enabled_rate_to_true_add = 0.0
enabled_rate_to_false_add = 0.0

[DefaultSpeciesSet]
compatibility_threshold = 5

[DefaultStagnation]
species_fitness_func = mean
max_stagnation       = 50
species_elitism      = 4

[DefaultReproduction]
elitism            = 2
survival_threshold = 0.2
min_species_size = 50

jtoleary commented 2 years ago

I have the exact same problem!

valpaz commented 2 years ago

Same problem !

nexon33 commented 2 years ago

This seems like a feature, I didn't make this library but read about the mechanisms used for evolution. Sometimes neurons get isolated or replaced at random, this includes disabling a connection that was there before, and which could also get reconnected at random

markste-in commented 2 years ago

This seems like a feature, I didn't make this library but read about the mechanisms used for evolution. Sometimes neurons get isolated or replaced at random, this includes disabling a connection that was there before, and which could also get reconnected at random

I tried the following in the config to deactivate the disabling of connections: enabled_mutate_rate = 0.0

I still get a lot of dangling nodes and a mostly "broken" net.

markste-in commented 2 years ago

I created a fork that has a changed implementation. It will remove all dangling nodes and connection at the end of every run

If people wanna try it out: https://github.com/markste-in/neat-python/tree/remove_dangling_nodes I am eager for any feedback. So far I made good experiences building big functional networks

ntraft commented 2 years ago

This seems like a feature, I didn't make this library but read about the mechanisms used for evolution. Sometimes neurons get isolated or replaced at random, this includes disabling a connection that was there before, and which could also get reconnected at random

I tried the following in the config to deactivate the disabling of connections: enabled_mutate_rate = 0.0

I still get a lot of dangling nodes and a mostly "broken" net.

Even without "disabled" (but existing) edges, you could still get lots of dangling nodes due to deleting edges, right? And you probably wouldn't want to disable the deleting of edges entirely, so that brings us back to where we started.

I don't know whether other NEAT implementations prevent this. It seems a bit ambiguous as to whether this is desirable or not; maybe those nodes could be rewired to be useful in a future generation. On the other hand, you're right that it adds more and more bloat as the algorithm progresses! Even if we're able to prune these away when instantiating the network, we are still wasting lots of time mutating them when they'll have no impact on the final output.

Btw, have you tried DefaultGenome.get_pruned_copy()? Doesn't that eliminate the dangling output nodes? (Though it doesn't remove the dangling input nodes, since they actually are used.)

Finebouche commented 6 months ago

Hey wanted to react to that, I don't think dandling nodes will add more bloat as the algorithm progresses as :

they are not use to compute the outputs of the network
they will probably be remove because of stagnation as any change in those part of the network have 0 chance of improving it.

Finebouche commented 6 months ago

Ok changed my mind. The problem with dandling nodes will be that genome modification can still only impact those dandling nodes and the algorithm can get stuck because most contribution to the genome append to those unconnected parts

ntraft commented 6 months ago

I've recently realized that this has long been a topic of debate in evolution more broadly: is "neutrality" beneficial or harmful? Is it an important feature of evolvability? (I.e., modifications to the genome which are neutral—they have no effect on selection.) Arguably, it may be useful to have these in the background if they could become useful later. A background store of genetic diversity. But obviously they can also be harmful because they provide no gradient toward improvement.

Finebouche commented 6 months ago

From my experience anyway, it doesn't seem really useful. I always end up with so many disconnected parts. It really seems hard for the networks and weights evolution to make cense of those big chunks of nodes (or parts of them) when they happen to get connected.

markste-in commented 6 months ago

Yeah exactly. In most of cases that I tested the algorithm got stuck because most of its "brain" was "dead" (dangling nodes without any impact). Further evolutions just made it worse and it never recovered. I thought about changing my implementation from "directly" removing dangling nodes to "removing it after a few evolution" to simulate a degeneration of dead cells.

markste-in commented 6 months ago

I forked the project and in the branch "remove_dangling_nodes" I now have the option trim_dangling_after_n_generations_wo_improvment in the config file.

If you set it to anything greater 0 it will trim the network of the species after it made no improvement for set generations. If you set it to 0 it will always trim and if you set it to anything negative no trimming will be done.

In the code I added a trim function that I call whenever a species has not improved for n generations

I modified the openai-lander example so people see how to use it (see config).

You can find the forked repo here: https://github.com/markste-in/neat-python/tree/remove_dangling_nodes

markste-in commented 6 months ago

I also found out that sometimes the output of an network are mistakenly used as an input. I fixed that in the my branch too. issue

Finebouche commented 6 months ago

Ah alright, fixed that in this pull request as well : https://github.com/CodeReclaimers/neat-python/pull/282

markste-in commented 6 months ago

ah nice but it looks like u remove potential dangling nodes directly. I am curious if it might be helpful to keep them for a while. Like I leave them for 30 epochs and then start to trim them before they hit the stagnation limit at 40 e.g.

Another question: Were you able to solve the Lunar Lander with it? I solved it a few years ago but I can't get it to solve anymore with the current code base

Finebouche commented 6 months ago

Hi, actually I don't remove the dangling nodes from the genome, but only from the feed forward network use to do the inference. This way dandling node can still evolve but they don't damage inference time (since they are useless for inference).

And yes, I was able to solve the Lunar Lander with it 👍

markste-in commented 6 months ago

could you share your config ... i think i am off somewhere and try to figure out where

Finebouche commented 6 months ago

Hi, Sure thing ! I made a nice little interface between Gym and neat-python so that you can use any gymnasium environment easily. Check this repo : https://github.com/Finebouche/neat-gymnasium It works with my current branch of neat-python

markste-in commented 6 months ago

Thanks for sharing! I tried to use you config on the current repository but it never so solves the LunarLander. I started to troubleshoot this repo a bit more since the version in your repo works (I wanted to understand what goes wrong).

It turns out that the fitness function is totally broke. It calculates some log-loss and never converges (Not sure what the inital intention of author was since it is not documented). I started to reverse the code and implemented a "pure" fitness oriented approach. Now I am able to solve the environement too.

I will fix the demo and propose a PR so people have a working demo when they find this repo.

Finebouche commented 6 months ago

Ah yes the lunar example is totally broken.

allinduetime commented 6 months ago

Working on messing with this stuff today, is mark’s commit still the best to handle all the issues or?

markste-in commented 6 months ago

The main repo here is still broken

but

There are two working solutions rn: A working fork of the current repository https://github.com/markste-in/neat-python/tree/remove_dangling_nodes

This one fixes all the mentioned issues and has a parameter (trim_dangling_after_n_generations_wo_improvment) to configure when to trim dangling nodes.
The lunarlander is working and plots the "best brains" every 10 generations and validates the best genome on 20 x 3 runs. It will run until the average of those 20 x 3 runs is better than 200. This can take a while since it takes a lot longer to get "a stable and robust" model

Then there is a good example from @Finebouche https://github.com/Finebouche/neat-gymnasium It needs the neat package installed and has a working lunarlander example.

This example will stop whenever the fitness threshold is meet. SInce there is no further validation it appears to solve the LunarLander quicker because you can hit the fitness threshold early if "your are lucky" and get 3 decent runs.
It is not plotting "the brains" every n-steps
... but it has some good further examples for other gym environments

You should checkout both. Maybe we get the best of both merged into here soon.

allinduetime commented 6 months ago

Fair, and when also considering #255, what’s the difference in comparison to finebouche’s #282 fix?

allinduetime commented 6 months ago

…man I am not responding from email again without yeeting the rest

markste-in commented 6 months ago

https://github.com/CodeReclaimers/neat-python/pull/282 and https://github.com/markste-in/neat-python/tree/remove_dangling_nodes are both fixing the issues mentioned in https://github.com/CodeReclaimers/neat-python/issues/255

allinduetime commented 6 months ago

Alright, thank you, I’m gonna go figure out how to install from GitHub now lol

CodeReclaimers / neat-python

Many dangling nodes without a connection to an output are created / left -> network breaks the longer you run it #250