Open johanbluecreek opened 2 years ago
Interesting question. I see two options:
You could do is use SymbolicRegression.jl
like an nsimplify
on steroids. For example, let's say we want to find an Ramanujan-like approximation to pi
. To do this, you pass integers as features, and set complexity_of_constants=100
- this will make real constants too complex to use.
options = Options(
# Make constants prohibitively expensive:
complexity_of_constants=100,
unary_operators=(sqrt, square),
binary_operators=(+, *, /, -),
mutationWeights=[0.0, 0.47, 0.79, 5.1, 1.7, 0.0020, 0.00023, 0.21],
# ^ Set p(mutate_constant)=0.0
shouldOptimizeConstants=false,
# ^ Set constant optimization off (so we don't waste cycles)
parsimony=0.001,
)
# Integers from 1 to 10:
X = reshape(collect(1.0:10.0), 10, 1)
y = [float(pi)]
EquationSearch(X, y; options=options, multithreading=true, niterations=1000)
This gives me as output:
which is actually pretty good, recovering many approximations of pi! https://en.wikipedia.org/wiki/Approximations_of_%CF%80 (I wonder if some of these are even known...?)
You could do something similar for other constants you wish to include - you could also set varMap
to make the printout give the constants in such a case.
One option is to basically apply the solution 1., but use it for the search itself. (i.e., concatenate the constants/integers with the data features, at each row)
More generally (say you want to include any integer), this is a bit tricky, especially because Optim.jl
only can optimize float-like constants. But you could basically rely on the mutate_constant
function (here) to explore the space of integers.
Right now, the library assumes that Node{T}
and Dataset{T}
have the same type T
. In other words: expression constants have the same type as the element type of the dataset. However, this isn't really necessary - one could rewrite the backend to allow for Node{T1}
, Dataset{T2}
, which would allow you directly store integers or rational numbers in the expressions, while still computing with float precision. One could then look at using JuMP.jl in place of Optim.jl
for explicit integer/rational optimization, if such a thing is possible.
Another option is to implement a function
function val(tree::Node{T})::Int
convert(Int, tree.val)
end
and make it so that every time tree.val
is called, you would instead call val(tree)
. You could then set that output type (here, an Int
), to some user-specified type. I think the Node{T1} / Dataset{T2}
solution is more elegant though.
comment deleted [sorry for the off topic question]
Hi @qwertyjl - I think this question is unrelated to the use of SymbolicRegression.jl/PySR - apologies but I do not have time to answer general math/science questions. Best, Miles
@MilesCranmer
Right now, the library assumes that Node{T} and Dataset{T} have the same type T. In other words: expression constants have the same type as the element type of the dataset. However, this isn't really necessary - one could rewrite the backend to allow for Node{T1}, Dataset{T2}, which would allow you directly store integers or rational numbers in the expressions, while still computing with float precision. One could then look at using JuMP.jl in place of Optim.jl for explicit integer/rational optimization, if such a thing is possible.
That would indeed be more elegant I think!
Nelder Mead, the simplex optimization, looks like it could be made to work on integers (although I assume I will not be optimal). An alternative would be to implement a really simple optimization for integers ("add/substract 1 to a constant") together with a warning and offer the user a way to supply a custom optimizer if they want (i.e. they can write a function that takes the same arguments as the call of Optim.jl). What do you think about that?
In any case, the splitting into Node{T1}, Dataset{T2} needs to deal with the optimization too I think. Also with the simplification, since stuff like "const1 / const2" might introduce rationals into a formula that should only contain integers.
I just came across Julia and this project, so I hope I do not talk too much nonsense! :)
Thank you for creating this great project!
@johanbluecreek Locally, I made SymbolicRegression.jl generate any integer by using these adaptions:
(Like Miles hinted) I changed mutate_constant
to basically this:
newval = node.val + convert(T, rand(rng, -5:5))
node.val = newval
make_random_leaf
now looks like this:
function make_random_leaf(
nfeatures::Int, ::Type{T}, ::Type{N}, rng::AbstractRNG=default_rng()
) where {T<:DATA_TYPE,N<:AbstractExpressionNode}
if rand(rng, Bool)
return constructorof(N)(; val=convert(T, rand(rng, -50:50))) # changed to return integers
else
return constructorof(N)(T; feature=rand(rng, 1:nfeatures))
end
end
Also set should_simplify=false
, since otherwise constants might get optimised into non-integers. However I noticed that one sometimes can leave it on, since those (mostly small) results of simplification can get dominated by newly introduced integer constants.
I have used this package for a work-flow that is basically: Solve a differential equation (DE) using
DifferentialEquations.jl
and then giving that solution toSymbolicRegression.jl
to find the analytical expression. From the DE it is quite obvious that the resulting expression should only contain integers or rationals. The optimization that is carried out for constants in the expressions gets very close, but it would be nice if there was either:Is there such functionality already in
SymbolicRegression.jl
that I have missed, or would it be useful to have?Simple examples would be having solutions like
5.0000000002*x1
, or whatever, rounded to5.0 * x1
, more advanced being the rationals having0.200000002 * x2
fixed tox2 / 5.0
or similar, or more advanced stillsin(x1 * 0.31830988618454)
tosin(x1 / π)
. In SymPy there is a functionnsimplify
that can handle the numerical part of such functionality, and works quite well, e.g.but I don't know if there is a Julia package that does the same thing.