Open iamcalvinlau opened 2 years ago
Hi @iamcalvinlau,
Thanks for the interesting question, and happy to hear you are enjoying the package! This is not built-in, but here is one suggestion. This line of the backend code:
https://github.com/MilesCranmer/SymbolicRegression.jl/blob/a9e696d513a791f48b18389f5e23830d90d32414/src/EvaluateEquation.jl#L145-L149 evaluates a leaf node and returns an array containing the value. This leaf node could be a constant (if tree.constant
) or variable.
You could basically tweak the constant line to constrain the constant value to be close to unity. For example:
if tree.constant
original_constant = tree.val
adjusted_constant = 1 + 0.1 * tanh(original_constant)
return (fill(convert(T, adjusted_constant), n), true)
else
return (cX[tree.feature, :], true)
end
So, I have made it so that the constant is always between 0.9 and 1.1, by passing it through the tanh function (which is built-in).
Then, give PySR the argument julia_project="<location of .../SymbolicRegression.jl/ >
to point to your custom backend.
You also need to modify this line: https://github.com/MilesCranmer/SymbolicRegression.jl/blob/a9e696d513a791f48b18389f5e23830d90d32414/src/Equation.jl#L155
with the same transform to tree.val
, so that it gives the correct equation back to PySR (and prints it correctly!).
Cheers, Miles
Hi Miles,
Thanks for the suggestions!
I ended up doing something like this (tied to parsimony because I was lazy being efficient, and shown here in case anyone wants to see the ugly way I did it)
(EvaluateEquation.jl)
n_sigdigits = round(Int, 1-log10(options.parsimony))
adjusted_constant = round(original_constant, sigdigits=n_sigdigits)
threshold = options.parsimony * 1e-2
if( abs(original_constant) < threshold )
adjusted_constant = 0.0
end
return (fill(convert(T, adjusted_constant), n), true)
(Equation.jl)
original_constant = tree.val
n_sigdigits = round(Int, 1-log10(options.parsimony))
adjusted_constant = round(original_constant, sigdigits=n_sigdigits)
threshold = options.parsimony * 1e-2
if( abs(original_constant) < threshold )
adjusted_constant = 0.0
end
return string(adjusted_constant)
In the end, I was getting nicely printed equations of the right form, but I also realized that it was counting the complexity of the terms with coefficients of zero value.
More specifically, I was finding equations of the form
'0.0 x{1} + \frac{0.000490 x{1}}{x_{0}}'
but it was still counting that first term on the left term as part of the overall complexity.
Sorry for the add-on question, but do you have a suggestion on how to make the complexity calculation not take those terms into account?
Thanks! --Calvin
Hi Calvin,
Great that you got it to work!
Regarding not counting 0's, here are two hacks:
0
if tree.val
(or your processed version of tree.val
) is 0. simplify_tree
(https://github.com/MilesCranmer/SymbolicRegression.jl/blob/a9e696d513a791f48b18389f5e23830d90d32414/src/SimplifyEquation.jl#L96) to automatically delete part of a tree if it is 0.0
multiplied by something. This requires a bit of work though.However, in general, if you run it long enough and with enough populations
, it should automatically get rid of terms which don't improve the accuracy of an expression. So you shouldn't in principle need either of these options.
Cheers, Miles
(options.complexity_mapping.constant_complexity
is by default 1, but you could re-define it with complexity_of_constants=...
passed to PySR)
First off, thanks for a really great package! I'm definitely a fan of this for many reasons, but a nice transparent package to suggest an interpretable model instead of a blackbox? Instant fan!
Is your feature request related to a problem? Please describe. In my current attempts to use PySR, I mostly use inputs which are already dimensionless so that numerical values in models should probably near unity (unless there's an interesting factor I've missed, which is always a possibility).
For example, in a particular test case, I am finding a suggested model of
whereas the correct model would have been
I understand that I can always take this suggested model and do optimization to find the coefficients a bit more clearly after the fact.
Describe the solution you'd like A more preferable approach would be the ability to add conditions for constants so that they are dropped out by thresholding. More specifically, I'm imagining an option to add a threshold for the magnitude of the constants so that constants which are below the threshold are dropped or zero-ed out after each iteration (or cycle per iteration?).
In the test above, then the term with the constant '2.5387976 \cdot 10^{-9}' would get dropped and the resulting suggested model would look like
Additional context I thinking about this in the way that SINDy and its variants have something like sequential thresholding, or in these cases where a user might have already pre-processed the data so that a model which is probably correct would have constants near unity.
Thanks in advance!