MilesCranmer / SymbolicRegression.jl

Distributed High-Performance Symbolic Regression in Julia
https://ai.damtp.cam.ac.uk/symbolicregression/
Apache License 2.0
633 stars 82 forks source link

[BUG]: Using SymPy in custom loss function results in julia crash #344

Open FelixSchloms opened 2 months ago

FelixSchloms commented 2 months ago

What happened?

I am working on an invertible symbolic regression task using SymPy in Julia. The goal is to obtain a symbolic expression that can be accurately expressed in each of the used variables. I was using SymPy to evaluate the performance of the found expression, even when solving for different variables.

My loss function isolated works as intended when using it directly with a tree, dataset, and options. However, when I attempt to use this same loss function in the SRRegressor, the Julia process crashes unexpectedly, and the console quits with an error.

Version

0.0.0

Operating System

Windows

Interface

Julia REPL

Relevant log output

The terminal process "C:\Users\LSF5FE\.julia\juliaup\julia-1.10.4+0.x64.w64.mingw32\bin\julia.exe '-i', '--banner=no', '--project=C:\Users\LSF5FE\.julia\environments\v1.10', 'c:\Users\LSF5FE\.vscode\extensions\julialang.language-julia-1.102.2\scripts\terminalserver\terminalserver.jl', '\\.\pipe\vsc-jl-repl-b7766e9e-9bbc-4fc7-a305-294fe0a154ba', '\\.\pipe\vsc-jl-repldbg-49c8b5af-a2bb-4c06-9a60-d1ea4b043f95', '\\.\pipe\vsc-jl-cr-52ed5e9c-c944-49d4-a88c-1bdd7ccf336c', 'USE_REVISE=true', 'USE_PLOTPANE=true', 'USE_PROGRESS=true', 'ENABLE_SHELL_INTEGRATION=true', 'DEBUG_MODE=false'" terminated with exit code: -1073740940

Extra Info

MWE which should reproduce the error:

using SymbolicRegression, LoopVectorization
using SymPy

function loss_invertible(tree, dataset::Dataset{T,L}, options)::L where {T, L}
    pred, flag = eval_tree_array(tree, dataset.X, options)

    if !flag
        return L(Inf)
    end
# error occurs as soon as I use a functionality of SymPy
# I started with: 
    y = symbols(dataset.y_variable_name)
end
MilesCranmer commented 2 months ago

This is a known incompatibility between the older PyCall.jl (which SymPy.jl still uses) and PythonCall.jl (which PySR has upgraded to). The fix is to call sympy directly using PythonCall. See https://juliapy.github.io/PythonCall.jl/stable/pythoncall/ for the docs.

MilesCranmer commented 2 months ago

I would also recommend using SymbolicUtils.jl instead of SymPy, which SymbolicRegression.jl has a built-in converter to. It will be MUCH faster. SymPy will be extremely slow.

FelixSchloms commented 2 months ago

Thank you for your quick reply. However, as far as I understand it correctly, the SymbolicUtils doesnt provide the possibilty to solve an equation for a different variable. But Symbolics.jl can with its latest version, which seems not be compatible with SymbolicRegression.jl. Adding SymbolicRegression.jl downgrades Symbolics.jl from 6.11 to 5.28, which doesnt provide this functionality of solving symbolic equations for different variables. Is there any possibility to get the functionality of solving an equation for different variables within the loss function of the SymbolicRegression.jl?

MilesCranmer commented 2 months ago

Symbolics.jl is a front end for SymbolicUtils.jl. They’re the same though

FelixSchloms commented 2 months ago

oh okay, I didnt notice. I am still confused why the Symbolics.jl front end, which seems much easier to handle for me, doesnt work in its latest version with the SymbolicRegression.jl?

MilesCranmer commented 2 months ago

It’s because I haven’t updated the version compatibility for SymbolicUtils yet. This PR needs to merge first: https://github.com/MilesCranmer/SymbolicRegression.jl/pull/326