Closed MilesCranmer closed 2 years ago
@AlCap23 this might be useful for debugging:
catch e
@warn "Error in symbolic_util simplify: $(e)\nFrom initial equation: $(stringTree(init_node, options))"
return init_node
end
where the preamble needs:
using Logging
@from "EquationUtils.jl" import countNodes, stringTree
I have this implemented on the dev
branch.
For example, the following tree:
> tree = Node("x1") * Node("x1") + Node("x1") * Node("x1") + 3f0
> printTree(tree, options)
(((x1 * x1) + (x1 * x1)) + 3.0)
cannot be simplified:
> simplifyWithSymbolicUtils(tree, options)
Error in symbolic_util simplify: ErrorException("Operation ^ in expression not found in operations (+, *, /, -)!")
SymbolicUtils normally converts multiplication into a polynomial basis, then collects common terms, which is fine, because at the end I run them through multiply_powers
to convert ^
into repeated multiplication. However it looks like multiply_powers
is not being called so the ^
is being left in the SymbolicUtils version of the equation, which results in an error.
multiply_powers
:
https://github.com/MilesCranmer/SymbolicRegression.jl/blob/561c9d7270073e28b1c82d43d7c328c5af0ebb1a/src/CustomSymbolicUtilsSimplification.jl#L16-L78
I see a problem:
julia> using SymbolicUtils
julia> @syms x
julia> x * x
x ^ 2
This must have become the default behaviour in a new SymbolicUtils. This is why simplification is broken.
Raised an issue on SymbolicUtils.jl: #428
Just a comment:
I had a similar issue last year implementing my own symbolic regression program. It also created multiple other issues with my algorithm and I ended up deciding against simplifying expressions at all during the search. I expected it to reduce the size of the space of all possible expressions considerably, but it ended up hurting discoverability a lot. Have you benchmarked your algorithm with and without simplifying equations during the search?
Indeed I think it would help discoverability because it collapses the search space. Note that Symbolics.jl is not "simplifying" the expressions, that's what it looks like to the user but that's somewhat of a misnomer. In reality, Symbolics.jl is doing things like storing expressions like
Add(T, coeff, dict::Dict)
Represents `coeff + (key1 * val1) + (key2 * val2) + ...`
That representation will automatically simplify things when you put things into it, because adding a like key to the dictionary and incrementing will do x + x => 2x
. So one reason for this is cheap auto-simplify, which gave around a 1000x speedup. But the other thing that happens is that searches in these representations are a lot leaner. x + 2y
and 2y + x
have the same representation, so the graph form Term(+,[Term(*,[2,y]),x])
has a combinatorically larger search space, with an exponential growth in the number of redundant representations. simplify
on the non-automatic operations thus is a lot faster because it does not have to check all permutations of a + (b + c)
vs (a + b) + c
etc. because that is handled by the Add
canonical form.
What @shashi is doing is creating variable types that do no use this canonical form and instead default to always building Term
graphs, though I would think that really using the reduced representations should just end up giving better algorithms by encoding implicit constraints to be explicit.
Hope @syms x::LiteralReal
works out here!
Thanks @shashi! Will work on integrating this.
@ChrisRackauckas, thanks for explaining all of this. Indeed this data structure seems to make a lot of sense and it is unfortunate I can't exploit it here for all cases. (It makes me think of a radix sort - the way each specific term is assigned to a specific bin.) I wonder if there is a way to have a similar data structure, but for arbitrary user-defined binary operators (+ set of rules).
@cobac - I haven't done a head to head comparison personally, but apparently some PySR users have been using earlier versions of PySR because they produced better results and faster. I looked into why this is and it was indeed because the simplification was turned on for those previous versions. I note that simplification only occurs 1 in 1000 mutations in SymbolicRegression.jl (controlled by one of the hyperparams) - which allows the evolution to freely explore equation space without being collapsed into a simplified form. Maybe this is why it doesn't hurt my performance but actually helps it.
Hey, sorry for being late to the party! @MilesCranmer to you still need help or did the proposed solution of @shashi worked? Otherwise, I'll start a PR based on the example above within one week.
Thanks @AlCap23! I haven't started on this yet so if you have time to work on it I would definitely appreciate the assistance.
To get started, I would revert 3b2608da21a0c9c57282cf40feac7649ef9849d9, bump SymbolicUtils to 0.19, then switch the symbol mappings over to LiteralReal.
The unit test here: https://github.com/MilesCranmer/SymbolicRegression.jl/blob/8d286a258229771f4071e4f57559ba26b893d11e/test/test_simplification.jl#L23-L26 should capture the desired behaviour: whether x1 x1 + x1 x1 simplifies to 2 x1 x1, rather than 2 * x1 ^ 2, given that the user does not have ^
in their operator list.
Sorry for the delay, I have been busy with another PR right now and start moving this week 😅. I'll probably have time in two weeks to have a look. Would that still be alright?
Sure, no problem! Hope your moving goes well :)
Another potential solution is to remove the simplification stuff. I'm not actually using it by default now, as I think large expressions slow down things quite a bit and caused some crashes. But the search is still quite good without a simplification algorithm; it seems to simplify automatically just from via the complexity penalty.
If the simplification backend is removed then all I need is the export to SymbolicUtils.jl, which is fairly easy to get working again.
Fixed on v0.9.0+
cc @AlCap23
I've noticed many times during training that equations are overly complex, so I think simplification isn't working for some reason. I narrowed it down to this: https://github.com/MilesCranmer/SymbolicRegression.jl/blob/561c9d7270073e28b1c82d43d7c328c5af0ebb1a/src/SimplifyEquation.jl#L153-L155
I think this is too general and is hiding the current issues in the simplifier. When I remove it, I see that many times, simplification is skipped for equations that should normally be easy to simplify.
@AlCap23 do you think you could help me fix the
convert
functions to capture the current API of SymbolicUtils? I think it will greatly improve the results of EquationSearch. Thanks!