MilesCranmer / PySR

High-Performance Symbolic Regression in Python and Julia
https://ai.damtp.cam.ac.uk/pysr
Apache License 2.0
2.44k stars 217 forks source link

Allow `y` to be vector-valued? #35

Closed patrick-kidger closed 3 years ago

patrick-kidger commented 3 years ago

It would be nice to allow y to be vector valued, essentially as a (parallelised?) shortcut for something like the following:

expressions = []
for index in range(y.shape[-1]):
    yi = y[:, index]
    expression_i = pysr.pysr(x, yi)
    expressions.append(expression_i)

This links to (and is actually motivated by) my suggestion in #32: as_pytorch could then return a single SymPyModule wrapping a list of all expressions, rather than returning several SymPyModules for each one, which would then have to be composed together in the above for loop.

MilesCranmer commented 3 years ago

Definitely agree. Actually it could be even nicer to just have the SymbolicRegression.jl backend loop over each y-component and return a list of equations at each complexity. Will think more about this.

MilesCranmer commented 3 years ago

Just added capabilities for multi-output (and multiple equations returned). SymbolicRegression.jl will now return a list of equations for each output.

This should be simple enough to set up for just importing into PySR. The trickier part is the JAX and PyTorch backend! I'm not sure how to approach - whether to continue outputting a different Torch/JAX model for each equation, or try to merge? It's probably overcomplicated to merge...

patrick-kidger commented 3 years ago

Is the PyTorch backend (you mean exporting via sympytorch?) actually available? I don't see it based on a quick glance through the code.

For sympytorch at least it's possible to pass multiple equations to the SymPyModule, i.e. vector values are automatically supported.

MilesCranmer commented 3 years ago

The PyTorch backend hasn't been implemented yet; it's next on my todo list!

That is nice that vector equations are supported! It doesn't assume the equations are all the same, but for different parameter choices, right? They can be completely different equations?

patrick-kidger commented 3 years ago

Yep, completely different equations.

MilesCranmer commented 3 years ago

Actually on second thought I'm not sure it will work due to the API of PySR: it will output a list of equations for each feature of y. When JAX/Torch export is turned on, it will generate a separate function/module for each expression. But I think the user will want to filter equations for each feature of y separately before making a choice of equation, and then fusing them. I don't see an easy way to do this otherwise? I also don't know if there'd be a speedup since the expressions are assumed to be different, and no operations are re-used?

patrick-kidger commented 3 years ago

Hmm. I see your point. Perhaps only offer an export as SymPy, and just document that there are external libraries to convert SymPy into PyTorch/JAX. Then the end user can do whatever they like to select their sympy expression from the various possibilities.