Gab0 / japonicus

Genetic Algorithm for Gekko Trading Bot.
MIT License
283 stars 103 forks source link

Questions about locales, individuals, crossover and life... #154

Open nmikaty opened 6 years ago

nmikaty commented 6 years ago

Some questions I haven't been able to answer by myself just by looking at the code. @Gab0 if you get the time I would love to read your answers.

  1. After the initial set of locales, can new locales spawn at any time or is there something triggering their appearance? Is there a way to increase the number of these new locales?

  2. Do individuals travel between locales?

  3. If a very fit individual is found, will its genome survive through all epochs? Or is there a chance it disappears (looks that way)?

  4. How do normal crossovers happen? Is the total genome (set of parameters) divided in two and recombined with the one of another individual (giving the possibility of only two different offspring for two parents)? Or is it happening differently (giving the possibility for more combinations from two parents)?

  5. With the -gc option, does Genesize = 3 mean that the total genome (set of parameters) will be divided in 3 instead of 2? And if yes will it give 2^3 possible offspring (minus the two parents), or still only 2?

  6. With the -gc option, how is Density interacting with the above Genesize?

  7. From my experience using Japonicus I get better results with lots of locales, 30-50 individuals per locale, and generally my best results are within the first 100 epochs. What about you?

Cheers :)

Gab0 commented 6 years ago

Hello sir @nmikaty, you wanted answers so here are some answers x]

  1. They appear randomly. I think the creation chance is hardcoded, maybe to be the same as localeExploceChance, need to check this out.

  2. When a locale ends (or explodes as we like to call it here - wtf?) some individuals there do travel to other locales, and on this single situtation. Would be nice to improve mechanics on the migration of individuals, it seems it would decrease overfitting but would take longer to get optimal resultz.

  3. Each locale has a hall of fame, so copies of the best individuals keeps spwaning at random.

  4. Depending on the genome representation we separe the genes among different chromosomes (-gc) or not (-g). So the chromosome cut happens for each chromosome (or full genome) of each partner and then one half of da parents are merged to make children chrs. I think this is the standard way.

  5. Genesize means the size of gene in the chromosome or genome array, like thresholds.up is represented like [..., 3, 6, 4,...] if Genesize == 3. I it is designed to make a gene be more prone to being cut inside it when we have bigger sizes.

  6. Density makes for the number of chromosomes, defining how many parameters (aka genes) are packed in each chr. It does not interact directly with Genesize, but when the generations run they have effects together but idk exactly what happens.

  7. Thats polemic XD I tested many settings, and had help of many testers... yeah, the thing is to have lots of locales and 70 to 100 or more individuals. Those settings will require lots of time to run but will yield better results. There is a limit on popsize though, when adding more won't matter... limit should be around 100.

After all that we have here is not complicated... given some running time the GA will output parameters that work on the input dataset(s). If it will overfit depends more on the strategy itself (most strats do overfit badly). Man maybe you could implement the individue migration things?

nmikaty commented 6 years ago

Hi @Gab0 ,

Thank you for these explanations! Everything makes much more sense now :)

Today I took the time to dig a little bit deeper into the code. I found some precisions I can share here:

  1. The creation chance of the locales is hardcoded in parallel_world.py, at 0.01. Which explains why so little locales are created after the initial set. We could increase this number, and maybe cap the number of locales to the initial number to avoid too many. This way we would have a relatively stable number of locales at all time.

  2. I found out that migration already happens at every epoch. Each locale has a chance to exchange a few random individuals with other locales. Interestingly, the chance increases if the other locales are "close". (Apparently each locale has a defined random position when it is created - the position is defined by random plan coordinates x and y, nothing to do with the position in the dataset). Here is the main code in parallel_world.py for migration:

    # --APPLY MIGRATION BETWEEN LOCALES;
    if len(World.locales):
        S, D = False, False
        LocalePairs = itertools.combinations(World.locales, 2)
        for L in LocalePairs:
            distance = calculateDistance(L[0].position, L[1].position)
            distance_weight = distance / World.maxdistance
            if random.random() > distance_weight:
                World.migration(L[0], L[1], (1, 7))
                World.migration(L[1], L[0], (1, 7))

    When a locale explodes, most (or all?) of its individuals also travel to other locales (again, the closer the other locale, the greater the number of individuals).

So I think it is key to keep a good number of locales alive at the same time to maintain the evolutionary dynamic.

  1. 5.6. I get a much better picture of how crossovers happen now. And to me it makes sense to use the -gc option and Density when we have a lot of parameters. I also get what Genesize is, but I still don't understand how it is useful in our GA.

Edit: After running a few optimizations I can say that the -gc option is indeed more efficient with my 18 parameters strategy (using Density = 4) . I also understand that Genesize affects the number of possible values for each parameter. For exemple if your Genesize is 2, your parameters will have 10^2 possible values. If your Genesize is 3 then 10^3 possible values. These numbers are inexact, but the idea is there.

I will try to play with locale creation chance when I get the time. And share the code if it's worth it.

Thanks again @Gab0 ;)

Gab0 commented 6 years ago

Oh so I write this cool stuff and forget about it XD jkjk, But yeah you are right @nmikaty, a cosntant number of locales would be better and it should be changed to it.

Maybe I did not remember about the migration because this is only a draft of what it can be, its a very cool concept that can be expanded in many ways, maybe with different enforced population sizes per locale, or locale division and merging (their population), idk... I will try to get to something on this over the week, lets have some ideas..

nmikaty commented 6 years ago

Hello @Gab0 ,

For the locales I think we could simply use NBLOCALE / localeExpirationAge as the chance ratio to create a new locale at each epoch. So for example if I start with 10 locales and a localeExpirationAge of 100, I would have a 10/100 = 0.1 (10%) chance of creating a new locale at each epoch. Meaning I will create one locale every 10 epochs on average, so I will end up with having approximately 10 locales alive at all time. Of course as we start with 10 locales, we will have more running during the first 100 epochs or so. But as the initial locales die we will have already created 10(ish) others of different ages, allowing migrations and a smooth evolution.

I actually really like what you already did with migration. The only thing I'm concerned about is when a locale explodes, some individuals will migrate to other locales, but some others won't. And as the process is random, we may end up loosing the fittest individuals of the locale. I'm not an expert in genetics or genetic algorithm, but my instinct tells me we should give a better chance of survival to the fittest individuals. It would be easy to use the tools.selBest() function to make sure the best individuals get priority to migrate when a locale explodes.

Gab0 commented 6 years ago

@nmkaty Sure, that can be a way to create locales... but if we are bothering to improve locale mechanics, we can ditch those "cold" ways of generating them and adopt more organic stuff that depends on the simulation itself. Like we only create locale by splitting locales or something, so no individue would be wasted as you said, and the division would be triggered by a condition or something (maybe evolution is stall for too long?) Then there should be a way to balance the chance on this to have a stable number of locs... I'll try to test every method.

nmikaty commented 6 years ago

Interesting yes. A locale could survive as long as it's evolving in the right direction, then explode/split if it's stable or evolving in the wrong direction for too long. It could be interesting to revive the fittest individuals from the locale's hall of fame before exploding/splitting it. To make sure we keep those good genomes around.

CryptoCoeus commented 6 years ago

Thanks for instructive conversation.. I have a question about @nmikaty 's 3. question :)

So, you said this :

  1. If a very fit individual is found, will its genome survive through all epochs? Or is there a chance it disappears (looks that way)?

What is considered as 'very fit individual' - What is the fitness function implemented in japonicus ?

Cheers,

nmikaty commented 6 years ago

Hi @CryptoCoeus ,

The fitness is defined by the result of the Gekko backtest for the individual on a specific time period. It also takes into account the market movement, so basically the fittest individual is the most profitable on that time period.

Cheers