MilesCranmer / SymbolicRegression.jl

Distributed High-Performance Symbolic Regression in Julia
https://ai.damtp.cam.ac.uk/symbolicregression/
Apache License 2.0
644 stars 86 forks source link

Increasing precision goal when optimizing constants #120

Closed johanbluecreek closed 2 years ago

johanbluecreek commented 2 years ago

Hi,

I can't really figure out how to increase the precision when optimizing constants. I'm looking at data corresponding to the analytical expression sin(x)/exp(x/5). The search yields one form or other of the correct expression, that is I get e.g. sin(x)exp(x/ -4.9997854) as a candidate solution. But it is basically never the best solution in the hall-of-fame, because instead of optimizing the constant (-4.9997854 in the example) candidates with additional corrections are better (lower loss), e.g. sin(x)*exp(x / (2.312148e-5x - 4.9998913)).

I tried to tune the optimize_probability (to 1.0) and optimizer_iterations (to 256), trying both the "NelderMead" and "BFGS", but I see no real gain in the precision in the constants.

Is there a way to solve this issue/am I just missing some option that does this?

MilesCranmer commented 2 years ago

Hi @johanbluecreek, When you say "best," do you mean the most accurate? Or do you mean that expression never appears in the entire hall of fame at even the simpler complexities?

It's often the case that the most accurate expression has some extra pieces to it - this is often due to numerical error. Can you try passing in a Float64 array instead? Then all the operations will switch to their 64-bit equivalents and might rule out some of these. You could also try the proposed change in #119 which will use Float64 constants in expressions if you pass a Float64 array.

However, ultimately it will always be the case that small corrections to the numerical formula will appear in the most accurate expressions. Operations that are not +, -, *, /, etc., are often numerically approximated on hardware with a series expansion and are only accurate to a certain precision themselves. e.g., if you compute log(1+erf(x)) for x < -1, the result is surprisingly innaccurate compared to a high-precision calculation. You could test this by performing the calculation with BigFloat. It could be that the order exp(x/5) is computed by Julia is different from how SymbolicRegression.jl computes it, which might give a small amount of error. Normally to get the "best" expression you would consider the accuracy-complexity tradeoff - that's what the score column shows.

Cheers, Miles

johanbluecreek commented 2 years ago

Thanks for the reply. I just found out that indeed the data I was using for input to EquationSearch had rather large numerical errors which I was not expecting. Correcting for this does produce much better results.

MilesCranmer commented 2 years ago

Cool!

Also, your comment made me realize what I said might not be the whole story...

Do you think you could also try adding some of these options? http://julianlsolvers.github.io/Optim.jl/v0.9.3/user/config/#general-options

to the following line? https://github.com/MilesCranmer/SymbolicRegression.jl/blob/7b4fecf9e136eb7b3acf29d0669e00e772efff56/src/ConstantOptimization.jl#L44

It might help with getting better precision in the constants regardless of numerical error.

johanbluecreek commented 2 years ago

You mean adding some of the fields of Optim.Options also to SymbolicRegression.Options (but with suitable names) that should be passed on that line?

Would it not be simpler to give SymbolicRegression.Options a field, say optimizer_options of type Optim.Options that is passed on that line, instead of having several of these as individual options (optimizer_iterations may perhaps be kept as a overriding short-hand)?

johanbluecreek commented 2 years ago

Oh, or you just meant to see if that can also solve the issue I had? Since my input data was differing from the actual function values to some ~1e-7, I would expect it to solve that data with a sin(x)/exp(x/N) shaped expression to more than that same ~1e-7, which indeed it did.

MilesCranmer commented 2 years ago

Ah, I see, thanks!

That is a great idea to have an optimizer_options field of type Optim.Options. Would you be interested in contributing something like that in a PR? I think the constructor should accept either a Dict or Optim.Options as an input type - then store an Optim.Options instance in SymbolicRegression.Options

johanbluecreek commented 2 years ago

Sure, I'll take a look at making a PR for that.

MilesCranmer commented 2 years ago

Awesome, thanks! New contributors always welcome 🙂