Open DeaglanBartlett opened 9 months ago
To begin, in commit fbbc17f02ae62a9413866acf181e125105d3cfc9 we generate the trees in a smarter way. Rather than starting from scratch at each complexity, we note that there are two ways of making trees at complexity c:
we also reduce the number of trees generated by telling the generator whether you can commute the arguments of the binary operator. If you can, we can restrict comp(right) <= comp(left) for the two children of the operator.
We also make a couple of other changes:
We find that this reduces the "total" number of equations for the "core_maths" basis set since we have removed trees which are just permutations of another but the same function function and some functions which are guaranteed to have larger description lengths than a lower-complexity counterpart
In 166ece85c66170d09257f902a8d57ce6378e438d we now convert all functions to strings using sympy, but only do this one at a time, so there is never more than one function stored in memory as a sympy object at a time. We do the simplest thing to identify duplicates and just see which strings are identical. This identifies some duplicates:
although there are fewer unique equations than the default ESR methods. This is because we have not attempted to identify duplicates which are re-parametrised versions of the same function. This is the next task to do.
Note that this method is much faster than the old one, taking 0.366s to generate all equations and identify duplicates up to and including complexity 6 for the core_maths function set. We are currently not using any parallelisation for the new method, so this could also be improved.
In the ESR paper it's stated that the main problem with getting to higher complexity is the memory requirements of the function generation process, since sympy objects are stored continuously throughout the process. Ways of improving this show be implemented.