PonyGE / PonyGE2

PonyGE2: grammatical evolution and variants in Python
GNU General Public License v3.0
153 stars 90 forks source link

We should add PTC2 #63

Open dvpfagan opened 7 years ago

dvpfagan commented 7 years ago

We should add PTC2-See here Robin Harper applied it to GE and Sean Luke had previously used it in GP.

If we are striving to make PonyGE state of the art we should look to include stuff like this

Dave

mfjoneill commented 7 years ago

We should definitely add PTC2, and also to a linear random initialisation add an enhanced variant which removes duplicates and invalids. Miguel's recent paper shows the superiority of these two initialisers (using linear operators) over all other variants https://link.springer.com/article/10.1007/s10710-017-9309-9

Many thanks, Mike.

jmmcd commented 2 years ago

Added Nicolau's RVD.

PTC2 would be more effort, will not attempt for now.

jmmcd commented 2 years ago

Suggestion: the default initialisation is currently PI_grow, but according to Nicolau RVD is better (and PTC variants are better still). For now, until/unless we add new initialisation operators, we could set the default to be RVD.

Thoughts?

jmmcd commented 2 years ago

According to Nicolau: "Harper [8] used PTC2 to initialise individuals in GE, with a uniform distribution of (derivation) tree-sizes; in his implementation, the size of a derivation tree is the number of grammar expansions performed."

Maybe we could get most of the benefit of PTC2 but still without looking at trees, as follows. We measure the size (not depth) of a tree in the same way as Harper. We use a discard approach (same as RVD) to make sure that we achieve the desired distribution of tree sizes.

jmmcd commented 2 years ago

init_results

jmmcd commented 2 years ago

According to this, sensible initialisation is worse than a pure random linear genome! I'm going to cc Miguel.

dvpfagan commented 2 years ago

Makes sense to try get PTC2 or a good approximation in the codebase. From memory, Robins ultimate goal was towards evolving/coevolving code/behaviour or robots. I think the other approaches were mainly tested on regression-based problems (I am also aware some people regard all problems as a form of regression problems) Anyway it think ptc2 or some equivalent would be great.

On Sun, 17 Oct 2021 at 23:21, James McDermott @.***> wrote:

According to this, sensible initialisation is worse than a pure random linear genome! I'm going to cc Miguel.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PonyGE/PonyGE2/issues/63#issuecomment-945204937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHTHOWXPBGGKHFUWW2NPN3UHND67ANCNFSM4DHRWNZA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

nicolaum73 commented 2 years ago

Hi all, yes indeed, I found SI to be one of the worst performers in my study. A very important thing to note here is that I used linear GE, with linear crossover and codon-mutation (effective versions of these, i.e. choosing the crossover and mutation points in the coding region of the genotype strings).

@jmmcd using a discard approach to "simulate" PTC2 might work, yes. Although I did not use any tree structures in my code. All you need is a grammar labelling similar to what I show in Table 1, which I assume you must have already, given that you have implemented SI. Then the algorithm shown in Fig. 3 can be implemented without trees.

If you want to implement PTC2 like Sean and Robin, then you need to keep track of tree depth as well (the approach I labelled PTC2D), because they establish a maximum depth for the syntax (Sean) or derivation (Robin) trees generated. I worked purely with size, without any depth limits, and found that 1) I get far better results without it; and 2) the algorithm is actually simpler to implement this way. Tree depth limits really make absolutely no sense in (linear) GE.

I'm sure SI will perform better when tree-based structures and operators are used. I also think that Harper's PTC2 approach, which uses a depth limit, rather than a pure "size" limit (which is what I use and propose), will probably do better when tree structures are manipulated. But PTC2 without tree depth limit will still be the overall better approach, I believe.

jmmcd commented 2 years ago

Thanks Miguel, very useful to get your insight.

I did not use any tree structures in my code. All you need is a grammar labelling similar to what I show in Table 1, which I assume you must have already, given that you have implemented SI.

Ah... we have SI, but it's implemented using tree structures. But yes, somewhere deep inside the derivation trees of PonyGE2 there is a labelling like this, I think.

I'm sure SI will perform better when tree-based structures and operators are used.

So maybe our SI is already good.

A natural first step would be to implement a discard approach to approximate PTC2 (measuring size only, not depth) (this would be easy) and then run a small version of Miguel's experiment (RND, RVD, SI aka RHH, PI_GROW, simulated PTC) on a few datasets.