Rare hang - Githubissues

HyperCodec commented 6 months ago

Not sure how this is happening but in extremely rare circumstances it is possible to hang indefinitely. See #57 workflow run for more info.

My guess is there is one very small outlying situation that causes a rwlock to be locked and used by a child node, but this shouldn't be possible with the well-tested circulation prevention algorithm. This definitely requires further debugging, but it is so rare and obscure that it is difficult to catch it and the details about what happened.

Bowarc commented 6 months ago

After a couple of generation, it just stops and i don't know why.

It appears to be 100% of the time with my current test, I pushed it at https://github.com/Bowarc/doodlai_jump/tree/ea955a6b681fcbaa2a4e3ec6d81f14970d5414b7

(The /ring package is responsible for training (the one hanging after a couple of generations), game is a lib for a rly simple version of doodle jump and display is to see the ai play)

HyperCodec commented 6 months ago

After a couple of generation, it just stops and i don't know why.

It appears to be 100% of the time with my current test, I pushed it at https://github.com/Bowarc/doodlai_jump/tree/ea955a6b681fcbaa2a4e3ec6d81f14970d5414b7

(The /ring package is responsible for training (the one hanging after a couple of generations), game is a lib for a rly simple version of doodle jump and display is to see the ai play)

Hmm so it's probably something with a recursive RwLock. I'll have to look into it further. It's probably some internal function causing a cyclic neuron dependency (like DFS not working or something).

HyperCodec commented 6 months ago

Btw @Bowarc can you use the serde feature to dump a json (or ron) file on the generation that hangs? (Probably the easiest way to do this would be to overwrite the same file with each generation and then stop the program when it hangs)

Bowarc commented 6 months ago

Ok, i'll do that tomorrow

Bowarc commented 6 months ago

Well, i stayed up longer than expected 😅 Here is the dna of every genome of a sim that froze

DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [RwLock { data: NeuronTopology { inputs: [(Input(3), -0.5625169)], bias: 0.59482414, activation: sigmoid
 }, poisoned: false, .. }], output_layer: [RwLock { data: NeuronTopology { inputs: [(Hidden(0), 0.9505495)], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.97373414), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }
DNA { network: NeuralNetworkTopology { input_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.19538373, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9611819, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5509694, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.31042653, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9654784, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.81183213, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.86611843, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.9298546, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8283311, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.8759112, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.4996699, activation: linear_activation
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [], bias: 0.5423544, activation: linear_activation
 }, poisoned: false, .. }], hidden_layers: [], output_layer: [RwLock { data: NeuronTopology { inputs: [], bias: 0.010660529, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(6), -0.1170296)], bias: 0.39411813, activation: sigmoid
 }, poisoned: false, .. }, RwLock { data: NeuronTopology { inputs: [(Input(7), -0.8857193), (Input(11), -0.97913766), (Input(2), 0.2923255), (Input(1), 0.26824117), (Input(2), -0.8934064), (Input(10), -0.19709682), (Input(9), -0.92098737), (Input(6), -0.9772694), (Input(7), 0.08727813), (Input(3), -0.61651254), (Input(9), 0.42674088), (Input(7), -0.801528), (Input(1), 0.6078919)], bias: 0.7993354, activation: sigmoid
 }, poisoned: false, .. }], mutation_rate: 0.01, mutation_passes: 3 } }

made a new commit if you wanna check it 8d75367

HyperCodec commented 6 months ago

Well, i stayed up longer than expected 😅 Here is the dna of every genome of a sim that froze...

Something I noticed here is that there are a lot inputs for Input layer neurons on one of the output neurons for each genome. I doubt this is just a result of evolution or something because of that huge ratio between it and the other neurons. Probably another issue to fix.

Anyways, I created #61 for the duplicate neuron references that are in the inputs to that output neuron.

HyperCodec commented 6 months ago

Merged #62, which is the main suspect of this issue.

@Bowarc Can you try to run with neat = { git = "https://github.com/hypercodec/neat", branch = "dev", features = ["whateveryouhadbefore"] } and see if it still hangs?

Bowarc commented 6 months ago

I've now tested over 3k generations, it seems to be stable, thank you for the fix (i had ["crossover", "rayon", "serde"] as features)

HyperCodec commented 6 months ago

Np

HyperCodec commented 6 months ago

You can use dev branch for now but it's not a good branch to stay on bc of large api changes, def change back to stable after next release.

Bowarc commented 6 months ago

Alright, thanks !

Bowarc commented 6 months ago

Oh I swapped to DivisionReproduction (i was on CrossoverReproduction before) and first try after about 125 generations it deadlocked

Here is the simulation data sim.backup.txt

I tried more tests, even went back to CrossoverReproduction w/ crossover_pruning_nextgen but it appears to be deadlocking 100% of the time again. After more tests i found that if i have a too low number of genome per generation (<100) it deadlocks in about 10/100 gens ~~Seems fine with 1000 genomes / gen~~

DivisionReproduction hangs after a bit with 1000 genomes, here is the sim data: sim.backup.txt

HyperCodec commented 6 months ago

I swapped to DivisionReproduction (i was on CrossoverReproduction before) and first try after about 125 generations it deadlocked

I tried more tests, even went back to CrossoverReproduction w/ crossover_pruning_nextgen but it appears to be deadlocking 100% of the time again. After more tests i found that if i have a too low number of genome per generation (<100) it deadlocks in about 10/100 gens ~Seems fine with 1000 genomes / gen~

DivisionReproduction hangs after a bit with 1000 genomes, here is the sim data: sim.backup.txt

Interesting that it made it through ~3k generations without deadlocking when on CrossoverReproduction the first time but not the second time. Perhaps you just got really lucky on that run. At least this eliminates the premise that the double neuron input thing is causing a deadlock (although it probably also was causing a deadlock in and of itself, maybe there are just multiple issues here)

HyperCodec commented 6 months ago

After looking through your backup files, I noticed that there are still duplicate inputs. I am not sure this time how they are being made.

Bowarc commented 6 months ago

While testing performances & learning curves, i found out that high mutation rate (=>0.1) deadlocks in less than 50 gens 100% of the time, and now that i think of it, it might be the difference between me saying that it looks good and me saying that it doesn't work again

Example:

pub const NB_GAMES: usize = 3;
pub const GAME_TIME_S: usize = 20; // Nb of secconds we let the ai play the game before registering their scrore
pub const GAME_DT: f64 = 0.05; // 0.0166
pub const NB_GENERATIONS: usize = 100;
pub const NB_GENOME_PER_GEN: usize = 2000;

neat::NeuralNetworkTopology::new(0.2, 3, rng)

Deadlocks in 15 generations

sim15.backup.txt

HyperCodec commented 6 months ago

While testing performances & learning curves, i found out that high mutation rate (=>0.1) deadlocks in less than 50 gens 100% of the time, and now that i think of it, it might be the difference between me saying that it looks good and me saying that it doesn't work again

Example:
pub const NB_GAMES: usize = 3;

pub const GAME_TIME_S: usize = 20; // Nb of secconds we let the ai play the game before registering their scrore

pub const GAME_DT: f64 = 0.05; // 0.0166

pub const NB_GENERATIONS: usize = 100;

pub const NB_GENOME_PER_GEN: usize = 2000;

neat::NeuralNetworkTopology::new(0.2, 3, rng)
Deadlocks in 15 generations

sim15.backup.txt

So yeah the deadlock issue is probably one of the mutations.

HyperCodec commented 5 months ago

I wonder if the deadlock might be happening during the mutation phase, leading to something that can't be accurately debugged as it hasn't finished mutating the neural network before it deadlocks.

HyperCodec commented 5 months ago

Might not necessarily mean anything, but just ran some stress tests and such on windows in dev branch (rayon and crossover) and it didn't deadlock once.

Either I'm just really lucky or this has something to do with platform-specific things.

Bowarc commented 5 months ago

Have you tried high mutation rate ?

HyperCodec commented 5 months ago

Yeah I just got lucky, it happens on any platform.

I did more testing and found that the deadlock is during the running phase, meaning that it's still probably some type of recursive RwLock.

HyperCodec commented 5 months ago

Still can't find this deadlock even after weeks, it's being really evasive.

It's almost certainly a recursive RwLockor duped input, but I have code to prevent both of those from happening.

I thought it might be something like those while loops that reroll until a valid state is reached infinitely looping because there is no valid state, but the deadlock doesn't happen during mutation so it can't be that (although probably do want to patch that, it's extremely rare and unlikely to ever happen but is still a possibility).

I'm really just out of ideas for what could possibly cause this issue.

HyperCodec commented 5 months ago

While I think this is definitely a high-priority issue that urgently needs to be fixed, I'll take a break from it so it doesn't keep taking time away from new features and such.

HyperCodec commented 4 months ago

I think I found the cause of the issue: if all threads have a lock waiting on other tasks, rayon has no way to access and run those dependency tasks.

HyperCodec commented 4 months ago

created https://github.com/rayon-rs/rayon/issues/1181, waiting for confirmation on a solution. if rayon takes too long to introduce a fix I can probably make a temporary fix here.

dsgallups commented 2 months ago

I'm not sure if this helps; I've been working on a crate based on yours and noticed that the network topology is able to create cycles in the data structure of the neural network. Please let me know if I'm missing something! (drawing a picture real quick)

dsgallups commented 2 months ago

Visual example attached

While NeuralNetworkTopology::mutate checks for duplicate inputs, it does not appear to resolve graph cycles. I think back edge detection would work here.

Edit: I've implemented this here

HyperCodec commented 2 months ago

Visual example attached

While NeuralNetworkTopology::mutate checks for duplicate inputs, it does not appear to resolve graph cycles. I think back edge detection would work here.

Edit: I've implemented this here

I had a DFS algorithm that was attempting to resolve these loops. Pretty sure I had it working but kind of hard to tell with how random things are in genetic simulations.

https://github.com/HyperCodec/neat/blob/main/src/topology/mod.rs#L119

I've also narrowed this down to pretty much only ever happening with the rayon feature enabled, so I'm thinking it's probably some lock collisions. The cpu usage goes down a ton, which also suggests that the threads are paused.

HyperCodec commented 2 months ago

Now that I think about it, I should really use seeded rng when testing these things so get rid of some of the randomness.

dsgallups commented 2 months ago

Found it in my fork. On deeply nested structures, par_iter(...).sum will be blocked on all threads, and therefore, no values can return when the summation of inputs occurs:

https://github.com/HyperCodec/neat/blob/228f7af2c55be5b1fb2c714d99b6ea633dfc5e14/src/runnable.rs#L106-L111

the sum operation can never complete, even if all of the iterator's components have returned. This is because, at the instant the final child completes, the thread is returned to the pool. Before the sum operation is provided to this last open thread, that thread is allocated to another par_iter that will block. Then, all other threads in rayon's thread pool are blocked (of which some are waiting this node's function to return), and cannot be given out to complete the sum op. I had posted proof of concept but have since moved my repo visibility to private.

HyperCodec commented 2 months ago

Found it in my fork. On deeply nested structures, par_iter(...).sum will be blocked on all threads, and therefore, no values can return when the summation of inputs occurs:

https://github.com/HyperCodec/neat/blob/228f7af2c55be5b1fb2c714d99b6ea633dfc5e14/src/runnable.rs#L106-L111

the sum operation can never complete, even if all of the iterator's components have returned. This is because, on the final return of the iterator, the thread is returned to the pool. Before the sum operation is provided to the open thread, that thread is allocated to another par_iter that will block. Then, all other threads in rayon's thread pool are blocked (of which some are waiting this node's function to return), and cannot be given out to complete the sum op. I had posted proof of concept but have since moved my repo visibility to private.

Are you sure this is because of lazy stacked sum and not the call to map before it, which uses rwlocks and such?

If sum is causing this, then would converting back to single-threaded iterator after mapping solve this issue?

dsgallups commented 2 months ago

Good point! Lemme make a real fork rq with rayon and run it with a high SplitConnection mutation rate and compare.

dsgallups commented 2 months ago

Ah, you were right. for_each also does not complete, even after the result is returned. I essentially have been using trace to determine this. here's the details, trying a RwLock instead of using .sum:

        let mut sum = RwLock::new(0.);

        self.inputs()
            .unwrap()
            .par_iter()
            .enumerate()
            .for_each(|(idx, input)| {
                info!(
                    "{} REQUEST INPUT ({}/{})",
                    self.id_short(),
                    idx,
                    num_inputs - 1
                );
                let res = input.get_input_value(self.id_short(), idx);
                info!(
                    "{} RECEIVED INPUT ({}/{}) ({})",
                    self.id_short(),
                    idx,
                    num_inputs - 1,
                    res
                );
                let mut sum = sum.write().unwrap();
                *sum += res;
            });

        info!("{} RETURNING RESULT FROM INPUTS", self.id_short());

        let sum = sum.into_inner().unwrap();
        self.activated_value = Some(sum);

The following log identifies a neuron that has received back all its inputs. However, the function never returns. Logs follow this view from other threads, but the last info trace of this particular node isn't called.

2024-09-24T16:05:14.445053Z  INFO candle_neat::simple_net::neuron: 398ba9 RECEIVED INPUT (0/1) (0)
2024-09-24T16:05:14.445084Z  INFO candle_neat::simple_net::neuron: 398ba9 RECEIVED INPUT (1/1) (0)

dsgallups commented 2 months ago

One interesting property to note is that, at least on my end, attaching by_uniform_blocks(1) to the parallel iterator stops this blocking behavior...at least that's what I've found after running a super high split rate after 5-6 minutes...I'm pretty sure this just makes the iterator sequential, but yeah lol

HyperCodec commented 2 months ago

neat hang diagram

This is a little diagram I made explaining my earlier theory. I'm not sure what can be done to prevent this without completely forking rayon (making a custom lock type compatible with it) or making some hacky spinlock solution with tons of rayon::yield_now() calls

HyperCodec commented 2 months ago

The reason this doesn't always deadlock is because rayon is work-stealing, meaning if any thread finishes before the others (as in the dependency task is the first one to be added to its queue or all the base tasks are on some other thread) it can steal tasks from the waiting threads, preventing a deadlock.

This deadlock only happens when all threads have a waiting task at the start of their queue, which isn't super common (and gets much rarer with each CPU core added).

HyperCodec commented 2 months ago

@dsgallups would you be able to look into this a bit? There is an issue on the rayon GitHub about it (https://github.com/rayon-rs/rayon/issues/592) but it's been open since 2018 and doesn't appear like it's going to be fixed any time soon.

HyperCodec commented 2 months ago

It looks from that issue that there is a workaround with a custom ThreadPool for locking stuff but not sure how well that'll work with a recursive algorithm like this.

dsgallups commented 2 months ago

If this was async, I'd know how to handle this with tokio since threads can rejoin the thread pool across await boundaries...it's an interesting challenge to determine when an iterator is ready to complete when all threads are being blocked. I'll take a look into it

edit: Going to see if https://github.com/rayon-rs/rayon/pull/1175 is a quick win

dsgallups commented 2 months ago

Unfortunately, I've decided not to pursue debugging rayon. I'm opting to do network expansion, transforming the network into a set of tensors as defined here and running on candle-rs. Hope someone else will be able to figure this one out! Just wanted to give an update. Thanks for your efforts!

HyperCodec / neat

Rare hang #58