MilesCranmer / PySR

High-Performance Symbolic Regression in Python and Julia
https://astroautomata.com/PySR
Apache License 2.0
2.33k stars 211 forks source link

[Feature]: Add complexity calculation for user defined expression #339

Closed OsAmaro closed 6 months ago

OsAmaro commented 1 year ago

Feature Request

Hi. I've recently started using PySR and I would like to suggest a new feature that I think would make the code even more user-friendly.

Would it be possible to have more direct access to the function that computes the complexity such that one can compare expressions found by PySR and those found in the literature?

For example: model.complexity('1 + x_0 + x_1**2')

This would allow the user to easily map the expressions found in the literature on the complexity vs accuracy plots.

Thank you in advance.

MilesCranmer commented 1 year ago

That's a good idea. This would probably have to be done by calling out to the Julia backend's compute_complexity function, and using jl.eval() to evaluate the expression.

It is a bit tricky because we would have to overload the user-defined operators to work for the expression type so that 1 + x_0 becomes an expression object. And there are some difficulties because you wouldn't want, e.g., 1 + 2 to automatically simplify to 3 before evaluation of the complexity.

Another option is to do the complexity calculation in pure Python, but it would add maintenance burden and also cause some issues due to the fact that some of the SymPy operators are mapped to multiple primitive operators (e.g., cos2(x)=cos(x^2) would count as cos and ^2, even though it would be a single operator.

MilesCranmer commented 1 year ago

Should be much easier after the PR #429 passes. Perhaps we could make a function to convert a string into a SymbolicRegression.jl equation (via the use of @extend_operators on sr_options_).

MilesCranmer commented 1 year ago

@OsAmaro – #430 fixes this. Could you please try it out?

OsAmaro commented 1 year ago

Hey @MilesCranmer,

I think this method works! I tried this PR on Docker for some examples and it seemed consistent. Appreciate your work. Ideally one would bypass the .fit entirely and just define the PySRRegressor model, but this is already very helpful! Many thanks!

Cheers, Óscar

MilesCranmer commented 1 year ago

I could potentially define a method that runs all the setup steps involved in .fit but without running the actual equation search. Right now .fit turns on Julia, creates the Julia options, and imports the backend. But those could be perhaps refactored into a separate method...

(Any help appreciated, as professor life is quite busy 🙂 )

MilesCranmer commented 6 months ago

Okay the functionality required has now been implemented as #564.

First, see https://github.com/MilesCranmer/PySR/discussions/550#discussioncomment-8842600 for how to create user-defined expressions in PySR.

Next, you can get complexity as follows:

model  # PySRRegressor that has already been fit, thus having `.julia_options_`
tree   # Expression you have defined by hand

jl.compute_complexity(tree, model.julia_options_)