MilesCranmer / PySR

High-Performance Symbolic Regression in Python and Julia
https://astroautomata.com/PySR
Apache License 2.0
2.11k stars 198 forks source link

[BUG]: got an error when setting complexity_of_constants #343

Closed x66ccff closed 1 year ago

x66ccff commented 1 year ago

What happened?

Hi there,

First of all, thank you for developing PySR! I have been experimenting with it and wanted to test it on the Nguyen benchmark problems, such as x^3+x^2+x. In order to prevent PySR from finding scalars, I tried setting complexity_of_constants=100. However, I encountered an error during the process:

(20, 1)
(20, 1)
[[ 0.85119328]
 [-0.72956365]
 [ 0.33353343]
 [ 0.95291893]
 [ 0.68468416]]
[[ 2.19243833]
 [-0.58562036]
 [ 0.48188176]
 [ 2.72627573]
 [ 1.47445129]]
  Activating project at `~/anaconda3/envs/myenv/share/pysr/depot/environments/pysr-0.11.5`
WARNING: method definition for TwiceDifferentiable at /home/me/anaconda3/envs/myenv/share/pysr/depot/packages/NLSolversBase/cfJrN/src/objective_types/incomplete.jl:96 declares type variable TH but does not use it.
WARNING: method definition for show at /home/me/anaconda3/envs/myenv/share/pysr/depot/packages/Optim/Zq1jM/src/univariate/printing.jl:7 declares type variable T but does not use it.
WARNING: method definition for best_of_sample at /home/me/anaconda3/envs/myenv/share/pysr/depot/packages/SymbolicRegression/RziqW/src/Population.jl:72 declares type variable T but does not use it.
WARNING: method definition for OneHotArray at /home/me/anaconda3/envs/myenv/share/pysr/depot/packages/MicroCollections/yJPLe/src/onehot.jl:79 declares type variable N but does not use it.
WARNING: method definition for adapt_structure at /home/me/anaconda3/envs/myenv/share/pysr/depot/packages/Transducers/DSfBv/src/partitionby.jl:50 declares type variable inbounds but does not use it.
WARNING: method definition for _foldl_array at /home/me/anaconda3/envs/myenv/share/pysr/depot/packages/Transducers/DSfBv/src/processes.jl:222 declares type variable T but does not use it.
WARNING: method definition for multiplyexistingvar at /home/me/anaconda3/envs/myenv/share/pysr/depot/packages/DynamicPolynomials/juS7t/src/mult.jl:1 declares type variable C but does not use it.
WARNING: method definition for multiplyexistingvar at /home/me/anaconda3/envs/myenv/share/pysr/depot/packages/DynamicPolynomials/juS7t/src/mult.jl:6 declares type variable C but does not use it.
Started!
Traceback (most recent call last):
  File "14_test_pysr_srbenchmark.py", line 71, in <module>
    model.fit(Input, Output)
  File "/home/me/anaconda3/envs/myenv/lib/python3.7/site-packages/pysr/sr.py", line 1750, in fit
    self._run(X, y, mutated_params, weights=weights, seed=seed)
  File "/home/me/anaconda3/envs/myenv/lib/python3.7/site-packages/pysr/sr.py", line 1620, in _run
    addprocs_function=cluster_manager,
RuntimeError: <PyCall.jlwrap (in a Julia function called from Python)
JULIA: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:345 [inlined]
 [2] fetch
   @ ./task.jl:360 [inlined]
 [3] _EquationSearch(::SymbolicRegression.CoreModule.ProgramConstantsModule.SRThreaded, datasets::Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{Float32}}; niterations::Int64, options::Options{Tuple{typeof(+), typeof(*), typeof(-), typeof(/)}, Tuple{typeof(cos), typeof(exp), typeof(safe_log), typeof(sin)}, Nothing, Nothing, typeof(loss), Int64}, numprocs::Int64, procs::Nothing, runtests::Bool, saved_state::Nothing, addprocs_function::Nothing)
   @ SymbolicRegression ~/anaconda3/envs/myenv/share/pysr/depot/packages/SymbolicRegression/RziqW/src/SymbolicRegression.jl:649
 [4] EquationSearch(datasets::Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{Float32}}; niterations::Int64, options::Options{Tuple{typeof(+), typeof(*), typeof(-), typeof(/)}, Tuple{typeof(cos), typeof(exp), typeof(safe_log), typeof(sin)}, Nothing, Nothing, typeof(loss), Int64}, numprocs::Int64, procs::Nothing, multithreading::Bool, runtests::Bool, saved_state::Nothing, addprocs_function::Nothing)
   @ SymbolicRegression ~/anaconda3/envs/myenv/share/pysr/depot/packages/SymbolicRegression/RziqW/src/SymbolicRegression.jl:346
 [5] EquationSearch(X::Matrix{Float32}, y::Matrix{Float32}; niterations::Int64, weights::Nothing, varMap::Vector{String}, options::Options{Tuple{typeof(+), typeof(*), typeof(-), typeof(/)}, Tuple{typeof(cos), typeof(exp), typeof(safe_log), typeof(sin)}, Nothing, Nothing, typeof(loss), Int64}, numprocs::Int64, procs::Nothing, multithreading::Bool, runtests::Bool, saved_state::Nothing, addprocs_function::Nothing)
   @ SymbolicRegression ~/anaconda3/envs/myenv/share/pysr/depot/packages/SymbolicRegression/RziqW/src/SymbolicRegression.jl:295
 [6] #EquationSearch#21
   @ ~/anaconda3/envs/myenv/share/pysr/depot/packages/SymbolicRegression/RziqW/src/SymbolicRegression.jl:320 [inlined]
 [7] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Any, NTuple{8, Symbol}, NamedTuple{(:weights, :niterations, :varMap, :options, :numprocs, :multithreading, :saved_state, :addprocs_function), Tuple{Nothing, Int64, Vector{String}, Options{Tuple{typeof(+), typeof(*), typeof(-), typeof(/)}, Tuple{typeof(cos), typeof(exp), typeof(safe_log), typeof(sin)}, Nothing, Nothing, typeof(loss), Int64}, Int64, Bool, Nothing, Nothing}}})
   @ Base ./essentials.jl:731
 [8] _pyjlwrap_call(f::Function, args_::Ptr{PyCall.PyObject_struct}, kw_::Ptr{PyCall.PyObject_struct})
   @ PyCall ~/anaconda3/envs/myenv/share/pysr/depot/packages/PyCall/ygXW2/src/callback.jl:32
 [9] pyjlwrap_call(self_::Ptr{PyCall.PyObject_struct}, args_::Ptr{PyCall.PyObject_struct}, kw_::Ptr{PyCall.PyObject_struct})
   @ PyCall ~/anaconda3/envs/myenv/share/pysr/depot/packages/PyCall/ygXW2/src/callback.jl:44

    nested task error: TaskFailedException
    Stacktrace:
     [1] wait
       @ ./task.jl:345 [inlined]
     [2] fetch
       @ ./task.jl:360 [inlined]
     [3] (::SymbolicRegression.var"#46#77"{Vector{Vector{Task}}, Int64, Int64})()
       @ SymbolicRegression ./task.jl:484

        nested task error: UndefVarError: T not defined
        Stacktrace:
         [1] best_of_sample(pop::Population{Float32}, running_search_statistics::SymbolicRegression.AdaptiveParsimonyModule.RunningSearchStatistics, options::Options{Tuple{typeof(+), typeof(*), typeof(-), typeof(/)}, Tuple{typeof(cos), typeof(exp), typeof(safe_log), typeof(sin)}, Nothing, Nothing, typeof(loss), Int64})
           @ SymbolicRegression.PopulationModule ~/anaconda3/envs/myenv/share/pysr/depot/packages/SymbolicRegression/RziqW/src/Population.jl:89
         [2] reg_evol_cycle(dataset::SymbolicRegression.CoreModule.DatasetModule.Dataset{Float32}, pop::Population{Float32}, temperature::Float32, curmaxsize::Int64, running_search_statistics::SymbolicRegression.AdaptiveParsimonyModule.RunningSearchStatistics, options::Options{Tuple{typeof(+), typeof(*), typeof(-), typeof(/)}, Tuple{typeof(cos), typeof(exp), typeof(safe_log), typeof(sin)}, Nothing, Nothing, typeof(loss), Int64}, record::Dict{String, Any})
           @ SymbolicRegression.RegularizedEvolutionModule ~/anaconda3/envs/myenv/share/pysr/depot/packages/SymbolicRegression/RziqW/src/RegularizedEvolution.jl:0
         [3] s_r_cycle(dataset::SymbolicRegression.CoreModule.DatasetModule.Dataset{Float32}, pop::Population{Float32}, ncycles::Int64, curmaxsize::Int64, running_search_statistics::SymbolicRegression.AdaptiveParsimonyModule.RunningSearchStatistics; verbosity::Int64, options::Options{Tuple{typeof(+), typeof(*), typeof(-), typeof(/)}, Tuple{typeof(cos), typeof(exp), typeof(safe_log), typeof(sin)}, Nothing, Nothing, typeof(loss), Int64}, record::Dict{String, Any})
           @ SymbolicRegression.SingleIterationModule ~/anaconda3/envs/myenv/share/pysr/depot/packages/SymbolicRegression/RziqW/src/SingleIteration.jl:37
         [4] macro expansion
           @ ~/anaconda3/envs/myenv/share/pysr/depot/packages/SymbolicRegression/RziqW/src/SymbolicRegression.jl:573 [inlined]
         [5] (::SymbolicRegression.var"#44#75"{SymbolicRegression.CoreModule.ProgramConstantsModule.SRThreaded, Options{Tuple{typeof(+), typeof(*), typeof(-), typeof(/)}, Tuple{typeof(cos), typeof(exp), typeof(safe_log), typeof(sin)}, Nothing, Nothing, typeof(loss), Int64}, Vector{Vector{Task}}, Int64, SymbolicRegression.AdaptiveParsimonyModule.RunningSearchStatistics, Int64, SymbolicRegression.CoreModule.DatasetModule.Dataset{Float32}, Int64})()
           @ SymbolicRegression ./threadingconstructs.jl:258>

Here's the code snippet I used:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '6'

import time
import numpy as np
# import sympy as sp
# import torch
# import pandas as pd

from pysr import PySRRegressor

from utils.data import get_benchmark_data

np.random.seed(0)

X, Y, use_constant, expression, variables_name = get_benchmark_data('benchmark.csv',
                        'Nguyen-1',
                        1000)

Input = X
Output = Y
print(X.shape)
print(Y.shape)
print(Input[:5])
print(Output[:5])

np.random.seed(0)
model = PySRRegressor(
    # random_state=0,
    # deterministic=True,
    # Make a PySR search give the same result every run.
    # To use this, you must turn off parallelism (with procs=0, multithreading=False), 
    # and set random_state to a fixed seed. Default is False.
    # procs=0,
    # multithreading=False,
    niterations=1000,  # < Increase me for better results
    binary_operators=["+", "*", "-", "/"],
    # should_optimize_constants=use_constant,
    # complexity_of_constants=100, # to prevent PySR finding scalars
    unary_operators=[
        "cos",
        "exp",
        "log",
        "sin",
        # "inv(x) = 1/x",
        # "neg(x) = -x",
        # ^ Custom operator (julia syntax)
    ],
    # extra_sympy_mappings={"inv": lambda x: 1 / x,
    #                       "neg": lambda x: -x},
    # ^ Define operator for SymPy as well
    loss="loss(prediction, target) = (prediction - target)^2",
    # ^ Custom loss function (julia syntax)
)

start_time = time.time()
np.random.seed(0)
model.fit(Input, Output)
end_time = time.time()
time_cost = end_time - start_time
print('time_cost',time_cost)

print(model)

Interestingly, when I commented out the line complexity_of_constants=100, the code ran without any errors. Do you have any insights into this issue?

Version

0.11.5

Operating System

Linux

Package Manager

None

Interface

Script (i.e., python my_script.py)

Relevant log output

No response

Extra Info

No response

MilesCranmer commented 1 year ago

Thanks for the bug report. Could you try seeing if this goes away in the latest version of PySR? (v0.14.1)

x66ccff commented 1 year ago

Yeah, this bug goes away in v0.14.1. Thank you!