Faster and less memory-intensive function generation

DeaglanBartlett commented 9 months ago

In the ESR paper it's stated that the main problem with getting to higher complexity is the memory requirements of the function generation process, since sympy objects are stored continuously throughout the process. Ways of improving this show be implemented.

DeaglanBartlett commented 9 months ago

To begin, in commit fbbc17f02ae62a9413866acf181e125105d3cfc9 we generate the trees in a smarter way. Rather than starting from scratch at each complexity, we note that there are two ways of making trees at complexity c:

You make a tree at complexity c-1 the argument of a unary operator
You apply a binary operator where the sum of the arguments' complexities is c-1

we also reduce the number of trees generated by telling the generator whether you can commute the arguments of the binary operator. If you can, we can restrict comp(right) <= comp(left) for the two children of the operator.

We also make a couple of other changes:

We don't want functions involving binary operators acting on two parameters, since this has a simpler representation with a single parameter. These are removed at all complexities if we remove them all at complexity 3.
Similarly, if we have a function without any x's, it cannot contain more than one constant
If child of the root node is the inverse of the root node we do not need to use this function

We find that this reduces the "total" number of equations for the "core_maths" basis set since we have removed trees which are just permutations of another but the same function function and some functions which are guaranteed to have larger description lengths than a lower-complexity counterpart new_method

DeaglanBartlett commented 9 months ago

In 166ece85c66170d09257f902a8d57ce6378e438d we now convert all functions to strings using sympy, but only do this one at a time, so there is never more than one function stored in memory as a sympy object at a time. We do the simplest thing to identify duplicates and just see which strings are identical. This identifies some duplicates:

new_method

although there are fewer unique equations than the default ESR methods. This is because we have not attempted to identify duplicates which are re-parametrised versions of the same function. This is the next task to do.

Note that this method is much faster than the old one, taking 0.366s to generate all equations and identify duplicates up to and including complexity 6 for the core_maths function set. We are currently not using any parallelisation for the new method, so this could also be improved.

DeaglanBartlett / ESR

Faster and less memory-intensive function generation #34