ArtemGr / bounty

Bounty navigator
MIT License
1 stars 7 forks source link

Post-ELM genetic #36

Closed ArtemGr closed 2 years ago

ArtemGr commented 2 years ago

(After some consideration, I think that instead of implementing genetic algorithm on a linked graph, as this "Evolving Neural Networks through Augmenting Topologies" implementation does, which involves growing and shrinking neural connections with heap allocations and pointers, we can make do with a fixed switchboard-like matrix (essentially a bounded simple directed graph) where absent connections are encoded as NOP functions.)

The idea is to (optionally) use the fast ELM training in order to find the initial neural network configuration, then add a number of NOP hidden layers to that network (in order not to bother with fine-grained connectivity mutations) and run a simple genetic algorithm (remove neuron = setting it to NOP, add neuron = setting it to a non-NOP, reducing mutations to the change of function/weight) to further fine-tune the network.

We are particularly interested in making it easy to add the custom and parameterized functions into the network (such as Snake with "the a parameter a learnable parameter for each preactivation value", or the functions a Fourier series is made of).

p.s. If we already have the ELM then why use genetic mutations at all? In a classic full training set environment most mutations will only make the network fit less the training set. One intuition here is to apply background mutation in a changing environment. ELM training requires the entirety of the training set (to Pseudo Inverse on). Compared to other training methods it's fast, but genetic mutation we can run step by tiny step in background, testing the fitness of the resulting NN instances on just the tail of the latest dataset. Even though the genetic search is less efficient than the Pseudo Inverse, the cost is amortized by the incremental nature of our approach. (Kind of like round-up investing.) Another hypothesis is that genetic mutation can employ a larger number of custom functions compared to just the ELM training.

(p.s. cf. "we use evolutionary programming to produce highly accurate, parsimonious, and robust predictive models. We “evolve” algorithms for a particular problem from math, constants, and variables in a digital ecosystem" - Beau)

p.s. Genetic mutation allows us also to train a network based on a (post-factum) fitness scalar (Reinforcement Learning?), whereas ELM training only works when we can manually synthesize a perfect output layer. That being said, every ELM is already a mutant, possibly more so than a gradient descent instance.

p.s. Something we figured around Probabilistic Graphical Models it that traditional NNs are too black box. We might want to build a network around the relationships we know of, fine-tune these specific relationships on the data, and then and only then fall back on whole-network training.
Specifically, it's useful to run just the bias and weight and activation mutations on a particular neural connection, such as mapping the action and the past velocity to the new Cart Pole velocity. Isolating that connection allows us to fine-tune it much faster and on a larger dataset.
(And to automatically adjust the power aka step of mutations to better match the range of values expected in the connection.)
When several such connections are trained, we'd like to protect them while running a general purpose whole-network genetic algorithm (or ELM pinv training), allowing it to focus on just the missing connections.

This might be particularly sensitive for genetic algorithms, because on a large search space the ELM and gradient descent would often find a local minima which might be “good enough”, whereas genetic algorithm would not converge. We need to narrow down the search space for a genetic algorithm to be effective. “Sacks will go far, if he does not go too far.” Saves it from going too far.

p.s. "In knowledge-based neural networks, an early example of a neurosymbolic model (Towell et al., 1990), a set of hand-written symbolic rules were compiled into a neural network, which is then refined using data." ... "A second problem specific to neural networks is the integration of existing information into the network. Complex, hand-designed networks can be viewed as an attempt to give networks some implicit knowledge of a problem domain."

ArtemGr commented 2 years ago

I think I figured it out with https://github.com/ArtemGr/bounty/commit/9922ad179ab4aeed60e9e9942439b800f43c5126. : D

We can embed domain knowledge, combine mutation with backpropagation, use strange activations and what not.

ArtemGr commented 2 years ago

https://arxiv.org/pdf/1705.05502.pdf shows the power of deep, sparsely connected layers. As PushGP suggests, it might be possible to evolve a program operating on such a complex layered structure by mutating just a linear vector of instructions! Provided that these instructions include a stack or a memory access. (A single strand of DNA can unfold into a three-dimensional and multiplanar human.)
Arguably, the net effect is not that different from a traditional graph of sparsely connected “neurons”, for we can represent a “program” as a graph of functions between the memory cells. But the linear encoding might be easier to work with. Adding a new connection between “neurons” equals to adding a function into the set of instructions. The order of instructions allows for finite recurrent loops between “neurons” (n1 = sin n2; n2 = cos n1). A “neuron” might even loop to itself (n1 = sin n1).
Importantly, the two dimensions of a traditional NN graph (layer X, neuron Y) are collapsed into a single dimension (neuron X, not assigned into a layer), implemented as a temporary HashMap<u16, f16> or Vec<f16>. A network of any complexity is just two vectors: of memory cells and of function connections between them.
It could be that a simplified backpropagation is possible (while skipping the functions not (yet) supporting it) merely by going through the instruction set backwards.
And with a different subset of functions we might evolutionary construct the probabilistic graphical models (aka structure learning).