Open AnasAbdelR opened 2 months ago
Great question. This is something I’m really eager to have. Some help would be much appreciated though.
The ongoing effort: the PR https://github.com/SymbolicML/DynamicExpressions.jl/pull/73 adds some necessary utilities to get this, which would let you be much more flexible in terms of how you define an expression — such as constraining functional forms, or such as learning parametric functions with per-‘category’ parameters.
If you are interested in helping, the next step would be to modify SymbolicRegression to use AbstractExpression
added in that PR, rather than the current behavior which uses AbstractExpressionNode
(a less flexible type). This should be possible to work on right away because the AbstractExpression
interface has matured — I’m just adding more tests at the moment.
Another option, if you don’t wish to do it directly with SymbolicRegression.jl, is to do what we did in https://arxiv.org/abs/2202.02306. We learn a single expression (the force law) while also learning per-planet mass parameters. It’s easier to do this with deep learning; you essentially have the per-system parameters be trainable, and simultaneously fit an MLP. Then, finally, use the method in https://arxiv.org/abs/2006.11287 (basically, fit the inputs and outputs of the MLP with PySR) to get the actual parametric form of the equation.
There are some issues from this compared to a regular genetic algorithm, so I think it would be nice to have a proper implementation directly with SR.jl.
Hi @MilesCranmer, I wanted to look into it if thats alright and had a few questions:
How does the algorithm work with parametric equations? Iiuc, we can pass in some parametric expression with the values of parameters for each dataset we have with https://github.com/SymbolicML/DynamicExpressions.jl/pull/73. How does the search algorithm work in that case? Looking at the test for parametric expression in https://github.com/SymbolicML/DynamicExpressions.jl/pull/73/files#diff-22d700493bea715bfef1d81576940fb67942b05bbac15c06b86bf549d6af3407R295, does the expression passed parse_expression
it like a starting point for other expressions which are generated and explored?
It would be very helpful on sketching a concrete example on how parametric equations would be potentially used with SymbolicRegression.jl (for my understanding).
It would be great if you could give some pointers on what things to change, specifically where to change AbstractExpressionNode
to AbstractExpression
as iiuc, for evaluation, it would still be converted to a normal Node like in https://github.com/SymbolicML/DynamicExpressions.jl/pull/73/files#diff-22d700493bea715bfef1d81576940fb67942b05bbac15c06b86bf549d6af3407R219. I can look into it and make a PR. (I am still getting familiarized with the codebase 😅 )
Feature Request
One way to prevent over-fitting when trying to find an equation for a particular trace of data is to provide multiple traces for a single equation as examples of the kind of results the equation produces. Is this something already implemented, and if not what would it take to get there?