Closed johanbluecreek closed 2 years ago
Hi @johanbluecreek, When you say "best," do you mean the most accurate? Or do you mean that expression never appears in the entire hall of fame at even the simpler complexities?
It's often the case that the most accurate expression has some extra pieces to it - this is often due to numerical error. Can you try passing in a Float64
array instead? Then all the operations will switch to their 64-bit equivalents and might rule out some of these. You could also try the proposed change in #119 which will use Float64
constants in expressions if you pass a Float64
array.
However, ultimately it will always be the case that small corrections to the numerical formula will appear in the most accurate expressions. Operations that are not +, -, *, /, etc., are often numerically approximated on hardware with a series expansion and are only accurate to a certain precision themselves. e.g., if you compute log(1+erf(x))
for x < -1
, the result is surprisingly innaccurate compared to a high-precision calculation. You could test this by performing the calculation with BigFloat
. It could be that the order exp(x/5)
is computed by Julia is different from how SymbolicRegression.jl computes it, which might give a small amount of error. Normally to get the "best" expression you would consider the accuracy-complexity tradeoff - that's what the score
column shows.
Cheers, Miles
Thanks for the reply. I just found out that indeed the data I was using for input to EquationSearch
had rather large numerical errors which I was not expecting. Correcting for this does produce much better results.
Cool!
Also, your comment made me realize what I said might not be the whole story...
Do you think you could also try adding some of these options? http://julianlsolvers.github.io/Optim.jl/v0.9.3/user/config/#general-options
to the following line? https://github.com/MilesCranmer/SymbolicRegression.jl/blob/7b4fecf9e136eb7b3acf29d0669e00e772efff56/src/ConstantOptimization.jl#L44
It might help with getting better precision in the constants regardless of numerical error.
You mean adding some of the fields of Optim.Options
also to SymbolicRegression.Options
(but with suitable names) that should be passed on that line?
Would it not be simpler to give SymbolicRegression.Options
a field, say optimizer_options
of type Optim.Options
that is passed on that line, instead of having several of these as individual options (optimizer_iterations
may perhaps be kept as a overriding short-hand)?
Oh, or you just meant to see if that can also solve the issue I had? Since my input data was differing from the actual function values to some ~1e-7
, I would expect it to solve that data with a sin(x)/exp(x/N)
shaped expression to more than that same ~1e-7
, which indeed it did.
Ah, I see, thanks!
That is a great idea to have an optimizer_options
field of type Optim.Options
. Would you be interested in contributing something like that in a PR? I think the constructor should accept either a Dict
or Optim.Options
as an input type - then store an Optim.Options
instance in SymbolicRegression.Options
Sure, I'll take a look at making a PR for that.
Awesome, thanks! New contributors always welcome 🙂
Hi,
I can't really figure out how to increase the precision when optimizing constants. I'm looking at data corresponding to the analytical expression
sin(x)/exp(x/5)
. The search yields one form or other of the correct expression, that is I get e.g.sin(x)exp(x/ -4.9997854)
as a candidate solution. But it is basically never the best solution in the hall-of-fame, because instead of optimizing the constant (-4.9997854
in the example) candidates with additional corrections are better (lower loss), e.g.sin(x)*exp(x / (2.312148e-5x - 4.9998913))
.I tried to tune the
optimize_probability
(to1.0
) andoptimizer_iterations
(to256
), trying both the"NelderMead"
and"BFGS"
, but I see no real gain in the precision in the constants.Is there a way to solve this issue/am I just missing some option that does this?