CodeReclaimers / neat-python

Python implementation of the NEAT neuroevolution algorithm
BSD 3-Clause "New" or "Revised" License
1.41k stars 490 forks source link

random() function in genome.py #105

Open evolvingfridge opened 7 years ago

evolvingfridge commented 7 years ago

Is there any experiments that used different random generators or it is well know that mutation only is normal distribution ? In general does such experiment makes sense ?

drallensmith commented 7 years ago

Well, it depends on whether you're talking about initialization of a FloatAttribute (weights, biases, various ones for multiparameter functions) or subsequent mutations. The first can be either using a gaussian/normal distribution or a uniform distribution. (Multiparameter functions default to the second, BTW.) Subsequent mutations can either re-initialize the attribute or, usually much more frequently, add a zero-mean gaussian.

In terms of genome.py, it has a mixture of random and choice functions used. The former are used for single checks vs a probability; the latter are essentially doing a uniform distribution over the possibilities.

Quite a lot of neural network work (both lots of non-evolutionary and some evolutionary) uses a uniform distribution for weight initialization. In terms of non-uniform choices among alternatives, that is also done in a lot of evolutionary work - "roulette selection", as in alternatives with varying chances to happen. I've put a function to do this in my fork's multiparam_funcs branch's math_util.py, random_proportional_selection, but have yet to integrate it into, say, configurable chances for the likelihoods of different functions to be selected.

evolvingfridge commented 7 years ago

I meant instead of random() to use various stochastic process functions, like in finance Random Walk functions and in physics Brownian Motion, random() can be implemented with jump diffusion. This can be used for both connection/node and attributes. Choice is uniform function, but it also can be implemented as stochastic process. Please take a look here, made this fork long time ago for such experiment, but not sure how much sense does it makes :)

drallensmith commented 7 years ago

Interesting! I can see the idea for continuous attributes, I think - it would be, as well as current mutation by (in essence) brownian motion (the addition of a zero-mean gaussian) + intermittent restarts, add in the possibility of an abrupt change that is still influenced by the starting point - say, using a uniform distribution (centered around zero) instead of (or, rather, as well as) the current mutation by adding a zero-mean gaussian. What are you thinking of in terms of non-continuous attributes? (Having mutational rates themselves evolving - including correlations on anything other than the entire population level - has the potential problem that evolution looks only at what works now, not what will be better in the future - so it tends to evolve overly-conservative mutation rates. Mutation rates that, instead of dictating whether a change happens at all, influence what the change is to, have worked better.)

drallensmith commented 7 years ago

I have, incidentally, been considering whether it would be worth it to make uniform vs gaussian initialization an evolvable parameter; it would need to convert to something with the same equivalent standard deviation (for uniform, the range/sqrt(12.0)) to avoid biases from overly-conservative evolution, as mentioned above. (The uniform distribution code treats setting init_stdev for uniform distributions as if this was instead range*0.25 (instead of something like 0.28) to make it easier to input a uniform distribution with a specified range, BTW.)

drallensmith commented 7 years ago

One could also do this by evolving a continuous parameter for the chance, each time an initialization or mutation happened, it would use a gaussian or uniform distribution (with equal standard deviations).

evolvingfridge commented 7 years ago

I was thinking individual genome would have different mutation rates, some genome can be more aggressive in updating attributes and others would be more passive (what changes is to). Right now each 95% mutation of attribute is within 2 standard deviations. Using stochastic functions allows to use various distributions, as examples if genome is stagnating it can use more aggressive mutation_rand() function, mutating kurtosis blue and red distributions. From attributes.py: self.clamp(value + gauss(0.0, mutate_power), config), gauss would become also attribute to a genome (Instead of gauss to use jump diffusion as example).

evolvingfridge commented 7 years ago

Thanks for pointing out overly-conservative mutation rates, I did not thing about that, but don't see as of now how this would happen if jump diffusion will be used, because genomes in population instead of being just normally distributed will be have expansion and contraction phases, with less overlaps. Talking out of my head, did not verify it with pen and paper :)

drallensmith commented 7 years ago

Quite welcome. If the population as a whole had fluctuations in mutations, that could be helpful - alternating between structural and non-structural, for instance; control of "bloat" via alternating between addition of structure and removal of structure, with alternations controlled by genome complexity + whether stagnation was happening too much, is another technique in use (e.g., in MultiNEAT).

Regarding stagnation, I've had thoughts of:

evolvingfridge commented 7 years ago

Just to make sure I understand it correctly, from out conversation it is good idea to move guess, random() and possibly choice to genome_config initially, before implementing stagnation functions.

drallensmith commented 7 years ago

By "guess", do you mean "gauss"? I'm not sure whether moving functions into genome_config would be best, or whether checking the genome config for which function to use would be best (since it could well vary by what attribute is being controlled).

evolvingfridge commented 7 years ago

Yes, sorry about miss spelling, I meant it should be implemented similarly as in multiparam fork. It looks like, if there is no knowledge of past then Gaussian distribution is best option for mutation, but Cauchy distributions is still a option, additionally deap has option for polynomial distribution and jump diffusion models is Poisson distribution (complicated).

Over all I believe mutation functions for structural or non-structural mutation should be exposed to configuration in same way as aggregation or activation functions.

Thanks for so many :+1: , I was not aweare of mutation schemes when was opening issue.

drallensmith commented 7 years ago

BTW, re evolution being too conservative on mutation parameters if allowed to be - see Natural Selection Fails to Optimize Mutation Rates for Long-Term Adaptation on Rugged Fitness Landscapes.

Similarly as to the multiparam fork - as in, multiple mutation functions with possibly-shared parameters? Cauchy would certainly be interesting, although it may well slam the variable into the min_value or max_value very frequently (admittedly, as with multiparam_relu for the xor problem, that can be the exact right thing to do!), and proper parameterization of it could be difficult. (OTOH, it might well be the ideal sort to use if starting with a crossover variant that blended the input continuous variables instead of choosing one parent's at random, and/or after something like gene duplication.) Polynomial distribution mutation is an interesting one - I'll have to take a further look at that; not sure re Poisson.

Structural vs non-structural: Yes, it's a difficult one. Re the minimum # of neurons needed:

Production of a sufficiently diverse population: Also a very interesting one (and with a lot of different possibilities for defining diversity in the first place - structural, structural+non-structural (as with NEAT speciation currently), diversity of satisfying multiple objectives, behavioral diversity in various respects...).

Configurability of mutation functions - I definitely agree, provided we're careful to keep the networks from being too conservative in that. (And thanks for the mention of mutation schemes - I think I read about that a while back, but it was quite a while back...)

evolvingfridge commented 6 years ago

As usual your references to publications is exactly what I needed to read, thank you!

Are fitness landscape and fractured decision space same a problem? (I think at least they are very closely related) I am still reading first paper, long-term adaptation vs short-term that what was on mind when I mentioned expansion and contraction phases.

Fractured problems publication answers many of my previews questions on classification of input space and comparison of neat variations (will post more questions/ideas on related issues that where opened before). Total variation of a function was one of my missing links, I reproduced eight versions of a sine wave function approximation problem and total variation equation for 1d space img (I am not sure I understand what supremum does and have a slight discrepancy in variation). I think it will be a good addition to neat-examples, but struggle to figure out how neat input/output was setup for a problem in publication. Any suggestion on how problem was setup ?

About configuration of mutation functions, I am thinking on adding configurability to multiparam fork or it will be better to add to neat-python fork ? (it looks like I will be able to reuse some of your code for mutation configurability).

P.S. Should have paid more attention when you first time mentioned rbf-neat :)

drallensmith commented 6 years ago

Quite welcome; I like pointing people toward useful information.

In a fractured decision space, a rugged - rapidly changing from "place" to "place" - fitness landscape should result if you considered both the state and the action together determining the "place" in the landscape. (If it wasn't that rugged, that may indicate something like that, while the optimum action may change rapidly, the differences between that and a satisfactory action are likely inconsequential.)

That link isn't working for me - will try again later.

If you can reuse some code, then it would make more sense to use the multiparam fork/branch; otherwise, I would put it in a neat-python fork.

Heh. Admittedly, it looks like that, if there are more than a few inputs, RBF-NEAT alone may not be able to do the job - too many variables. I'm currently considering (once enough data is available) pre-training (as is done with deep neural networks; something like an autoencoder and/or based on information content - for the latter, see Evolving Deep LSTM-based Memory networks using an Information Maximization Objective) a layer of RBF nodes to be a basis for MM-NEAT (an RBF output would be the first added connection for each new preference node).

evolvingfridge commented 6 years ago

I think link works now.

I am probably wrong and was not clear what I meant in my previews comment, but for complex environment such as soccer game, fitness function can change rapidly over time same as optimal action. In other words, selecting a fitness function (priority) of a player is a optimal decision in a fractured problem space.

I am about to try sine experiment, it looks like setup is on 1000 data points with 10 inputs and output is not specified :(