dbl001 commented 1 year ago

Is your feature request related to a problem? Please describe. I tried running 'pysr' on a 1,000 row array with 4 integer input variables and one integer output variable - a Goedel Number.

From Mathematica:

GoedelNumber[l_List] := Times @@ MapIndexed[Prime[First[#2]]^#1 &, l]

E.g.

Data file:
# 7 1   5   8   6917761200000

julia> 2^7*3^1*5^5*7^8
6917761200000

The model returned:

Complexity  Loss       Score     Equation
1           Inf       NaN       0.22984365

I am just learning 'pysr' and maybe it's just 'user error'. However, Inf and Nan suggest that Goedel numbers may exceed Float64.

Describe the solution you'd like Not sure what happened, because the largest Goedel number in the input is: 1.6679880978201e+23

Additional context I didn't see any parameters to set 'verbose' mode or 'debugging' information.

GoedelTableFourParameters.txt

MilesCranmer commented 1 year ago

Good idea. So far there is only precision in [16, 32, 64], but in principle, BigFloat should be easy to support.

You can hack it like this:

import pysr
jl = pysr.julia_helpers.init_julia()

# First, set the precision:
jl.setprecision(200)  # 200 bits

# Define `randn` so it works for BigFloat:
jl.eval("""
    using Random: AbstractRNG

    try
        # In case randn(BigFloat) gets defined in the future
        randn(BigFloat)
    catch e
        if isa(e, MethodError)
            Base.randn(::Type{BigFloat}, args::Integer...) = BigFloat.(randn(Float64, args...))
            Base.randn(rng::AbstractRNG, ::Type{BigFloat}, args::Integer...) = BigFloat.(randn(rng, Float64, args...))
        else
            throw(e)
        end
    end
""")

Then, you need to edit pysr/sr.py, at this line: https://github.com/MilesCranmer/PySR/blob/46fe8c163f147d8447993c80cba4bf0d2f43d461/pysr/sr.py#L1638. Add the lines:

Main.X = Main.eval("BigFloat.(X)")
Main.y = Main.eval("BigFloat.(y)")
if weights is not None:
    Main.weights = Main.eval("BigFloat.(weights)")

I think that should be it. Let me know if it works. I'd also love a pull request if you can figure out a nice way to make this configurable using the precision kwarg.

Cheers, Miles

dbl001 commented 1 year ago

I have tried: i. custom loss functions ii. various loss functions, iii. different operators iv. larger input file, v. weights. but the output 'loss' value does not decrease.

Curiously, the loss value is in the order of magnitude of the Goedel Number.

model = PySRRegressor(
    loss="myloss(x::BigFloat, y::BigFloat) =  abs(x - y)",
    #loss="myloss(x::BigFloat, y::BigFloat, w::BigFloat) =  w * abs(x - y)",
    #loss="L1DistLoss()",
    niterations=10,
    binary_operators=["mult", "pow"],
    unary_operators=["square", "cube"],
)
model.fit(X=input, y=output)
#model.fit(X=input, y=output, weights=weights)

Cycles per second: 7.410e+01
Head worker occupation: 0.0%
Progress: 17 / 150 total iterations (11.333%)
==============================
Hall of Fame:
-----------------------------------------
Complexity  Loss       Score     Equation
1           3.353e+19  6.548e-01  7.445906796448958112542609059634996192301824964176114035162089e+12
5           3.352e+19  1.280e-04  cube(cube(x2 * x3))

...

Cycles per second: 5.370e+01
Head worker occupation: 0.0%
Progress: 22 / 150 total iterations (14.667%)
==============================
Hall of Fame:
-----------------------------------------
Complexity  Loss       Score     Equation
1           3.353e+19  6.548e-01  7.445906796448958112542609059634996192301824964176114035162089e+12
5           3.351e+19  1.419e-04  ((x0 * x3) ^ x1)
6           3.351e+19  1.660e-04  square(-0.376641516035376267002021677399170584976673126220703125 * (x1 ^ x3))
7           3.277e+19  2.230e-02  (((x3 * x1) ^ x2) * x0)
8           2.836e+19  1.445e-01  (((x3 * x1) ^ x2) * cube(x0))
9           2.746e+19  3.234e-02  ((((x3 * x1) ^ x2) * x3) * x0)

Do you see any mistakes?
Is there a 'trace' output mode? GoedelNumber.ipynb.gz

MilesCranmer commented 1 year ago

It looks like it's going down, because 2.746e19 < 3.35e19, unless I misunderstand. Maybe it's just hard to find? What's the dataset you're trying to fit?

Note that you're running with 10 niterations. You could just try increasing that to 1000000 and watching the output and see if it eventually gets something.

Also, maybe you could try normalizing it first? i.e., y = (y - np.average(y))/np.std(y).

dbl001 commented 1 year ago

Thanks! Here's the data file. GoedelTableFourParameters.txt

MilesCranmer commented 1 year ago

How did the suggestions work?

Also, what is the true relation?

dbl001 commented 1 year ago

Currently running.

GoedelNumber[l_List] := Product[Prime[i]^l[[i]], {i, Length[l]}]
Primes: 2, 3, 5, 7, ... raised to input values in data file X0, X1, X2, X3
 7  1   5   8   6917761200000
julia> 2^7*3^1*5^5*7^8
6917761200000

Just restarted:

Cycles per second: 4.410e+01
Head worker occupation: 0.0%
Progress: 38 / 15000000 total iterations (0.000%)
==============================
Hall of Fame:
-----------------------------------------
Complexity  Loss       Score     Equation
1           3.353e+19  6.548e-01  7.4573353679525459137524786825444414833514637971313149029982497e+12
5           3.352e+19  1.280e-04  cube(cube(x2 * x3))
6           3.351e+19  8.297e-05  (cube(-0.416052722554691001288773577471147291362285614013671875 * x3) ^ x2)
7           3.340e+19  3.517e-03  ((x2 ^ x0) * (x3 ^ x0))
9           3.095e+19  3.806e-02  (square(cube(x2 * x0)) * (x3 ^ x1))
10          3.091e+19  1.244e-03  (square(square((x1 * cube(x3)) * x2)) * x0)
13          3.013e+19  8.470e-03  (square(square((x1 * cube(x3)) * x2) * -0.403145459628434787990869381246739067137241363525390625) * square(x0))

MilesCranmer commented 1 year ago

Just to be clear, it can't find the true relation with those operators, right? Since the true relation would require a prime operator?

dbl001 commented 1 year ago

My 'myprime_abs' is getting an exception from 'assert_operators_defined_over_reals' in Configure.jl.

#myprime_function = "myprime_abs(x) = prime(rand(1:abs(Int(x))))"
myprime_function = "myprime_abs(x::Integer) = prime(convertInt64, (abs(x)))"
ri = _randint()

model = PySRRegressor(
    loss="myloss(x::BigFloat, y::BigFloat) =  abs(x - y)",
    #loss="L1DistLoss()",
    #niterations=1000000,
    niterations=10,
    #binary_operators=["mult", "pow"],
    #unary_operators=["square", "cube", myprime_function],
    unary_operators=[myprime_function],
    extra_sympy_mappings={
        "myprime_abs" : lambda x: sympy.prime(sympy.Abs(x))
    },
    # No prime inside prime:
    #nested_constraints={"myprime_abs": {"myprime_abs": 0}},
)
model.fit(X=input, y=output)
#model.fit(X=input, y=output, weights=weights)

Here's enhanced debugging:

JULIA: AssertionError: Your configuration is invalid - one of your operators (BigFloat) (Options(
    # Operators:
        binops=Function[+, *, -, /], unaops=Function[myprime_abs],
    # Loss:
        loss=myloss,
    # Complexity Management:
        maxsize=20, maxdepth=20, bin_constraints=[(-1, -1), (-1, -1), (-1, -1), (-1, -1)], una_constraints=[-1], use_frequency=true, use_frequency_in_tournament=true, parsimony=0.0032, warmup_maxsize_by=0.0, 
    # Search Size:
        npopulations=15, ncycles_per_iteration=550, npop=33, 
    # Migration:
        migration=true, hof_migration=true, fraction_replaced=0.000364, fraction_replaced_hof=0.035,
    # Tournaments:
        prob_pick_first=0.86, tournament_selection_n=10, topn=12, 
    # Constant tuning:
        perturbation_factor=0.076, probability_negate_constant=0.01, should_optimize_constants=true, optimizer_algorithm=BFGS, optimizer_probability=0.14, optimizer_nrestarts=2, optimizer_iterations=8,
    # Mutations:
        mutation_weights=MutationWeights(0.048, 0.47, 0.79, 5.1, 1.7, 0.002, 0.00023, 0.21, 0.0), crossover_probability=0.066, skip_mutation_failures=true
    # Annealing:
        annealing=false, alpha=0.1, 
    # Speed Tweaks:
        batching=false, batch_size=5000, fast_cycle=false, 
    # Logistics:
        output_file=hall_of_fame_2022-12-06_095936.707.csv, verbosity=1000000000, seed=3844562118, progress=false,
    # Early Exit:
        early_stop_condition=nothing, timeout_in_seconds=nothing,
).operators) (myprime_abs) (BigFloat[-100.0, -97.95918367346939703566022217273712158203125, -95.918367346938765649611013941466808319091796875, -93.877551020408162685271236114203929901123046875, -91.8367346938775455100767430849373340606689453125, -89.7959183673469425457369652576744556427001953125, -87.75510204081632537054247222840785980224609375, -85.71428571428572240620269440114498138427734375, -83.6734693877551052310082013718783855438232421875, -81.632653061224488055813708342611789703369140625, -79.591836734693885091473930515348911285400390625, -77.5510204081632537054247222840785980224609375, -75.5102040816326507410849444568157196044921875, -73.4693877551020477767451666295528411865234375, -71.4285714285714448124053888022899627685546875, -69.387755102040813426356180571019649505615234375, -67.346938775510210462016402743756771087646484375, -65.306122448979607497676624916493892669677734375, -63.26530612244897611162741668522357940673828125, -61.22448979591837314728763885796070098876953125, -59.18367346938774886666578822769224643707275390625, -57.142857142857138796898652799427509307861328125, -55.102040816326535832558874972164630889892578125, -53.06122448979591155193702434189617633819580078125, -51.020408163265301482169888913631439208984375, -48.979591836734698517830111086368560791015625, -46.93877551020407423720826045610010623931884765625, -44.89795918367347127286848262883722782135009765625, -42.857142857142861203101347200572490692138671875, -40.81632653061225113333421177230775356292724609375, -38.77551020408162685271236114203929901123046875, -36.73469387755102388837258331477642059326171875, -34.69387755102041381860544788651168346405029296875, -32.65306122448979664341095485724508762359619140625, -30.612244897959186573643819428980350494384765625, -28.57142857142856229302196879871189594268798828125, -26.53061224489795932868219097144901752471923828125, -24.4897959183673492589150555431842803955078125, -22.4489795918367320837205625139176845550537109375, -20.4081632653061291193807846866548061370849609375, -18.367346938775511944186291657388210296630859375, -16.3265306122448947689917986281216144561767578125, -14.28571428571428469922466319985687732696533203125, -12.24489795918368173488488537259399890899658203125, -10.20408163265305034883567714132368564605712890625, -8.16326530612244738449589931406080722808837890625, -6.12244897959183020930140628479421138763427734375, -4.081632653061234350388986058533191680908203125, -2.0408163265306171751944930292665958404541015625, 0.0, 2.0408163265306171751944930292665958404541015625, 4.081632653061234350388986058533191680908203125, 6.122448979591837314728763885796070098876953125, 8.1632653061224544899232569150626659393310546875, 10.204081632653043243408319540321826934814453125, 12.2448979591836604186028125695884227752685546875, 14.28571428571427759379730559885501861572265625, 16.3265306122448947689917986281216144561767578125, 18.367346938775511944186291657388210296630859375, 20.4081632653061291193807846866548061370849609375, 22.4489795918367320837205625139176845550537109375, 24.4897959183673492589150555431842803955078125, 26.5306122448979664341095485724508762359619140625, 28.571428571428583609304041601717472076416015625, 30.61224489795917946821646182797849178314208984375, 32.653061224489789537983597256243228912353515625, 34.69387755102039960775073268450796604156494140625, 36.73469387755102388837258331477642059326171875, 38.77551020408162685271236114203929901123046875, 40.81632653061225113333421177230775356292724609375, 42.857142857142861203101347200572490692138671875, 44.89795918367347127286848262883722782135009765625, 46.938775510204095553490333259105682373046875, 48.979591836734698517830111086368560791015625, 51.020408163265301482169888913631439208984375, 53.061224489795904446509666740894317626953125, 55.10204081632652872713151737116277217864990234375, 57.142857142857138796898652799427509307861328125, 59.18367346938774886666578822769224643707275390625, 61.22448979591837314728763885796070098876953125, 63.26530612244897611162741668522357940673828125, 65.306122448979607497676624916493892669677734375, 67.346938775510210462016402743756771087646484375, 69.387755102040813426356180571019649505615234375, 71.428571428571416390695958398282527923583984375, 73.4693877551020477767451666295528411865234375, 75.5102040816326507410849444568157196044921875, 77.5510204081632537054247222840785980224609375, 79.591836734693885091473930515348911285400390625, 81.632653061224488055813708342611789703369140625, 83.67346938775511944186291657388210296630859375, 85.71428571428572240620269440114498138427734375, 87.75510204081632537054247222840785980224609375, 89.795918367346956756591680459678173065185546875, 91.83673469387753129922202788293361663818359375, 93.877551020408162685271236114203929901123046875, 95.918367346938765649611013941466808319091796875, 97.95918367346939703566022217273712158203125, 100.0])  is not well-defined over the real line. You can get around this by returning `NaN` for invalid inputs.
Stacktrace:
 [1] assert_operators_defined_over_reals(T::Type, options::Options{typeof(myloss), Int64, 0.86, 10})
   @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/Configure.jl:24
 [2] test_option_configuration(T::Type, options::Options{typeof(myloss), Int64, 0.86, 10})
   @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/Configure.jl:44
 [3] _EquationSearch(::SymbolicRegression.CoreModule.ProgramConstantsModule.SRThreaded, datasets::Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{BigFloat}}; niterations::Int64, options::Options{typeof(myloss), Int64, 0.86, 10}, numprocs::Nothing, procs::Nothing, addprocs_function::Nothing, runtests::Bool, saved_state::Nothing)
   @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:438
 [4] EquationSearch(datasets::Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{BigFloat}}; niterations::Int64, options::Options{typeof(myloss), Int64, 0.86, 10}, parallelism::String, numprocs::Nothing, procs::Nothing, addprocs_function::Nothing, runtests::Bool, saved_state::Nothing)
   @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:383
 [5] EquationSearch(X::Matrix{BigFloat}, y::Matrix{BigFloat}; niterations::Int64, weights::Nothing, varMap::Vector{String}, options::Options{typeof(myloss), Int64, 0.86, 10}, parallelism::String, numprocs::Nothing, procs::Nothing, addprocs_function::Nothing, runtests::Bool, saved_state::Nothing, multithreaded::Nothing)
   @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:320
 [6] #EquationSearch#21
   @ ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:345 [inlined]
 [7] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Any, NTuple{8, Symbol}, NamedTuple{(:weights, :niterations, :varMap, :options, :numprocs, :parallelism, :saved_state, :addprocs_function), Tuple{Nothing, Int64, Vector{String}, Options{typeof(myloss), Int64, 0.86, 10}, Nothing, String, Nothing, Nothing}}})
   @ Base ./essentials.jl:731
 [8] _pyjlwrap_call(f::Function, args_::Ptr{PyCall.PyObject_struct}, kw_::Ptr{PyCall.PyObject_struct})
   @ PyCall ~/.julia/packages/PyCall/7a7w0/src/callback.jl:32
 [9] pyjlwrap_call(self_::Ptr{PyCall.PyObject_struct}, args_::Ptr{PyCall.PyObject_struct}, kw_::Ptr{PyCall.PyObject_struct})
   @ PyCall ~/.julia/packages/PyCall/7a7w0/src/callback.jl:44>

All the other operators in PYRS take reals as input. https://astroautomata.com/PySR/operators/

However, Primes.prime takes an integer input. How should this issue in 'assert_operators_defined_over_reals' in Configure.jl. be handled? Perhaps I have other issues that I'm missing.

jl = pysr.julia_helpers.init_julia()

import sympy

jl.eval("using Pkg")

jl.eval('Pkg.add("Primes")')
jl.eval("using Primes")

# If necessary:
jl.eval('Pkg.add("SpecialFunctions")')

jl.eval("using SymbolicRegression")
jl.eval('Pkg.add("SymbolicRegression")')
jl.eval('ENV["JULIA_DEBUG"] = "SymbolicRegression"')

# First, set the precision:
jl.setprecision(200)  # 200 bits

# Define `randn` so it works for BigFloat:
jl.eval("""
    using Random: AbstractRNG

    try
        # In case randn(BigFloat) gets defined in the future
        randn(BigFloat)
    catch e
        if isa(e, MethodError)
            Base.randn(::Type{BigFloat}, args::Integer...) = BigFloat.(randn(Float64, args...))
            Base.randn(rng::AbstractRNG, ::Type{BigFloat}, args::Integer...) = BigFloat.(randn(rng, Float64, args...))
        else
            throw(e)
        end
    end
""")
jl.eval("""

    try
        # In case abs(BigFloat) gets defined in the future
        Base.abs(x::BigFloat) = sign(x) * x
    catch e
        if isa(e, MethodError)
            Base.abs(::Type{BigFloat}, args::Integer...) = BigFloat.(abs(Float64, args...))
        else
            throw(e)
        end
    end
""")

MilesCranmer commented 1 year ago

What is happening is your operator is outputting an integer, rather than a BigFloat. You will need to convert back to a BigFloat at the end. Define it like this:

myprime_abs(x::BigFloat) = convert(BigFloat, prime(round(Int, x)))

MilesCranmer commented 1 year ago

Also, if you want "myprime_abs" to show up in the output sympy, follow the tips here: https://github.com/MilesCranmer/PySR/discussions/233. But your solution with sympy.prime(sympy.Abs(x)) is good too!

MilesCranmer commented 1 year ago

TODO's for me to make this easier to debug are:

Make the error message state: "Your operator needs to be defined over the real line, and output the same type as input."
Make the error message not dump a string for options, like Options(... - which makes it harder to read.
Move #233 into the docs.

Would that help in the future?

dbl001 commented 1 year ago

Sounds good. I'm still getting the same exception in 'assert_operators_defined_over_reals' in Configure.jl with the recommended code changes:

#myprime_function = "myprime_abs(x) = prime(rand(1:abs(Int(x))))"
myprime_function = "myprime_abs(x::BigFloat) = Convert(BigFloat, prime(round(Int, x)))"

ri = _randint()

model = PySRRegressor(
    loss="myloss(x::BigFloat, y::BigFloat) =  abs(x - y)",
    #loss="L1DistLoss()",
    #niterations=1000000,
    niterations=10,
    #binary_operators=["mult", "pow"],
    #unary_operators=["square", "cube", myprime_function],
    unary_operators=[myprime_function],
    extra_sympy_mappings={
        "myprime_abs" : lambda x: sympy.prime(sympy.Abs(x))
    },
    # No prime inside prime:
    #nested_constraints={"myprime_abs": {"myprime_abs": 0}},
)
model.fit(X=input, y=output)
#model.fit(X=input, y=output, weights=weights)

Here's the stack trace /Users/davidlaxer/anaconda3/envs/pysr/lib/python3.9/site-packages/pysr/sr.py:1257: UserWarning: Note: it looks like you are running in Jupyter. The progress bar will be turned off. warnings.warn( /Users/davidlaxer/anaconda3/envs/pysr/lib/python3.9/site-packages/pysr/julia_helpers.py:201: UserWarning: Julia has already started. The new Julia options {'threads': 16} will be ignored. warnings.warn( ┌ Warning: You are using multithreading mode, but only one thread is available. Try starting julia with --threads=auto. └ @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:414 myprime_abs, -100.0

--------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [13], line 19
      3 ri = _randint()
      5 model = PySRRegressor(
      6     loss="myloss(x::BigFloat, y::BigFloat) =  abs(x - y)",
      7     #loss="L1DistLoss()",
   (...)
     17     #nested_constraints={"myprime_abs": {"myprime_abs": 0}},
     18 )
---> 19 model.fit(X=input, y=output)

File ~/anaconda3/envs/pysr/lib/python3.9/site-packages/pysr/sr.py:1797, in PySRRegressor.fit(self, X, y, Xresampled, weights, variable_names)
   1794     self._checkpoint()
   1796 # Perform the search:
-> 1797 self._run(X, y, mutated_params, weights=weights, seed=seed)
   1799 # Then, after fit, we save again, so the pickle file contains
   1800 # the equations:
   1801 if not self.temp_equation_file:

File ~/anaconda3/envs/pysr/lib/python3.9/site-packages/pysr/sr.py:1657, in PySRRegressor._run(self, X, y, mutated_params, weights, seed)
   1651 cprocs = (
   1652     None if parallelism in ["serial", "multithreading"] else int(self.procs)
   1653 )
   1655 # Call to Julia backend.
   1656 # See https://github.com/MilesCranmer/SymbolicRegression.jl/blob/master/src/SymbolicRegression.jl
-> 1657 self.raw_julia_state_ = SymbolicRegression.EquationSearch(
   1658     Main.X,
   1659     Main.y,
   1660     weights=Main.weights,
   1661     niterations=int(self.niterations),
   1662     varMap=self.feature_names_in_.tolist(),
   1663     options=options,
   1664     numprocs=cprocs,
   1665     parallelism=parallelism,
   1666     saved_state=self.raw_julia_state_,
   1667     addprocs_function=cluster_manager,
   1668 )
   1670 # Set attributes
   1671 self.equations_ = self.get_hof()

RuntimeError: <PyCall.jlwrap (in a Julia function called from Python)
JULIA: AssertionError: Your configuration is invalid - one of your operators (BigFloat) (Options(
    # Operators:
        binops=Function[+, *, -, /], unaops=Function[myprime_abs],
    # Loss:
        loss=myloss,
    # Complexity Management:
        maxsize=20, maxdepth=20, bin_constraints=[(-1, -1), (-1, -1), (-1, -1), (-1, -1)], una_constraints=[-1], use_frequency=true, use_frequency_in_tournament=true, parsimony=0.0032, warmup_maxsize_by=0.0, 
    # Search Size:
        npopulations=15, ncycles_per_iteration=550, npop=33, 
    # Migration:
        migration=true, hof_migration=true, fraction_replaced=0.000364, fraction_replaced_hof=0.035,
    # Tournaments:
        prob_pick_first=0.86, tournament_selection_n=10, topn=12, 
    # Constant tuning:
        perturbation_factor=0.076, probability_negate_constant=0.01, should_optimize_constants=true, optimizer_algorithm=BFGS, optimizer_probability=0.14, optimizer_nrestarts=2, optimizer_iterations=8,
    # Mutations:
        mutation_weights=MutationWeights(0.048, 0.47, 0.79, 5.1, 1.7, 0.002, 0.00023, 0.21, 0.0), crossover_probability=0.066, skip_mutation_failures=true
    # Annealing:
        annealing=false, alpha=0.1, 
    # Speed Tweaks:
        batching=false, batch_size=5000, fast_cycle=false, 
    # Logistics:
        output_file=hall_of_fame_2022-12-06_111514.776.csv, verbosity=1000000000, seed=3844562118, progress=false,
    # Early Exit:
        early_stop_condition=nothing, timeout_in_seconds=nothing,
).operators) (myprime_abs) (BigFloat[-100.0, -97.95918367346939703566022217273712158203125, -95.918367346938765649611013941466808319091796875, -93.877551020408162685271236114203929901123046875, -91.8367346938775455100767430849373340606689453125, -89.7959183673469425457369652576744556427001953125, -87.75510204081632537054247222840785980224609375, -85.71428571428572240620269440114498138427734375, -83.6734693877551052310082013718783855438232421875, -81.632653061224488055813708342611789703369140625, -79.591836734693885091473930515348911285400390625, -77.5510204081632537054247222840785980224609375, -75.5102040816326507410849444568157196044921875, -73.4693877551020477767451666295528411865234375, -71.4285714285714448124053888022899627685546875, -69.387755102040813426356180571019649505615234375, -67.346938775510210462016402743756771087646484375, -65.306122448979607497676624916493892669677734375, -63.26530612244897611162741668522357940673828125, -61.22448979591837314728763885796070098876953125, -59.18367346938774886666578822769224643707275390625, -57.142857142857138796898652799427509307861328125, -55.102040816326535832558874972164630889892578125, -53.06122448979591155193702434189617633819580078125, -51.020408163265301482169888913631439208984375, -48.979591836734698517830111086368560791015625, -46.93877551020407423720826045610010623931884765625, -44.89795918367347127286848262883722782135009765625, -42.857142857142861203101347200572490692138671875, -40.81632653061225113333421177230775356292724609375, -38.77551020408162685271236114203929901123046875, -36.73469387755102388837258331477642059326171875, -34.69387755102041381860544788651168346405029296875, -32.65306122448979664341095485724508762359619140625, -30.612244897959186573643819428980350494384765625, -28.57142857142856229302196879871189594268798828125, -26.53061224489795932868219097144901752471923828125, -24.4897959183673492589150555431842803955078125, -22.4489795918367320837205625139176845550537109375, -20.4081632653061291193807846866548061370849609375, -18.367346938775511944186291657388210296630859375, -16.3265306122448947689917986281216144561767578125, -14.28571428571428469922466319985687732696533203125, -12.24489795918368173488488537259399890899658203125, -10.20408163265305034883567714132368564605712890625, -8.16326530612244738449589931406080722808837890625, -6.12244897959183020930140628479421138763427734375, -4.081632653061234350388986058533191680908203125, -2.0408163265306171751944930292665958404541015625, 0.0, 2.0408163265306171751944930292665958404541015625, 4.081632653061234350388986058533191680908203125, 6.122448979591837314728763885796070098876953125, 8.1632653061224544899232569150626659393310546875, 10.204081632653043243408319540321826934814453125, 12.2448979591836604186028125695884227752685546875, 14.28571428571427759379730559885501861572265625, 16.3265306122448947689917986281216144561767578125, 18.367346938775511944186291657388210296630859375, 20.4081632653061291193807846866548061370849609375, 22.4489795918367320837205625139176845550537109375, 24.4897959183673492589150555431842803955078125, 26.5306122448979664341095485724508762359619140625, 28.571428571428583609304041601717472076416015625, 30.61224489795917946821646182797849178314208984375, 32.653061224489789537983597256243228912353515625, 34.69387755102039960775073268450796604156494140625, 36.73469387755102388837258331477642059326171875, 38.77551020408162685271236114203929901123046875, 40.81632653061225113333421177230775356292724609375, 42.857142857142861203101347200572490692138671875, 44.89795918367347127286848262883722782135009765625, 46.938775510204095553490333259105682373046875, 48.979591836734698517830111086368560791015625, 51.020408163265301482169888913631439208984375, 53.061224489795904446509666740894317626953125, 55.10204081632652872713151737116277217864990234375, 57.142857142857138796898652799427509307861328125, 59.18367346938774886666578822769224643707275390625, 61.22448979591837314728763885796070098876953125, 63.26530612244897611162741668522357940673828125, 65.306122448979607497676624916493892669677734375, 67.346938775510210462016402743756771087646484375, 69.387755102040813426356180571019649505615234375, 71.428571428571416390695958398282527923583984375, 73.4693877551020477767451666295528411865234375, 75.5102040816326507410849444568157196044921875, 77.5510204081632537054247222840785980224609375, 79.591836734693885091473930515348911285400390625, 81.632653061224488055813708342611789703369140625, 83.67346938775511944186291657388210296630859375, 85.71428571428572240620269440114498138427734375, 87.75510204081632537054247222840785980224609375, 89.795918367346956756591680459678173065185546875, 91.83673469387753129922202788293361663818359375, 93.877551020408162685271236114203929901123046875, 95.918367346938765649611013941466808319091796875, 97.95918367346939703566022217273712158203125, 100.0])  is not well-defined over the real line. You can get around this by returning `NaN` for invalid inputs.
Stacktrace:
 [1] assert_operators_defined_over_reals(T::Type, options::Options{typeof(myloss), Int64, 0.86, 10})
   @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/Configure.jl:24
 [2] test_option_configuration(T::Type, options::Options{typeof(myloss), Int64, 0.86, 10})
   @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/Configure.jl:44
 [3] _EquationSearch(::SymbolicRegression.CoreModule.ProgramConstantsModule.SRThreaded, datasets::Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{BigFloat}}; niterations::Int64, options::Options{typeof(myloss), Int64, 0.86, 10}, numprocs::Nothing, procs::Nothing, addprocs_function::Nothing, runtests::Bool, saved_state::Nothing)
   @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:438
 [4] EquationSearch(datasets::Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{BigFloat}}; niterations::Int64, options::Options{typeof(myloss), Int64, 0.86, 10}, parallelism::String, numprocs::Nothing, procs::Nothing, addprocs_function::Nothing, runtests::Bool, saved_state::Nothing)
   @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:383
 [5] EquationSearch(X::Matrix{BigFloat}, y::Matrix{BigFloat}; niterations::Int64, weights::Nothing, varMap::Vector{String}, options::Options{typeof(myloss), Int64, 0.86, 10}, parallelism::String, numprocs::Nothing, procs::Nothing, addprocs_function::Nothing, runtests::Bool, saved_state::Nothing, multithreaded::Nothing)
   @ SymbolicRegression ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:320
 [6] #EquationSearch#21
   @ ~/anaconda3/envs/ai/share/julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:345 [inlined]
 [7] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Any, NTuple{8, Symbol}, NamedTuple{(:weights, :niterations, :varMap, :options, :numprocs, :parallelism, :saved_state, :addprocs_function), Tuple{Nothing, Int64, Vector{String}, Options{typeof(myloss), Int64, 0.86, 10}, Nothing, String, Nothing, Nothing}}})
   @ Base ./essentials.jl:731
 [8] _pyjlwrap_call(f::Function, args_::Ptr{PyCall.PyObject_struct}, kw_::Ptr{PyCall.PyObject_struct})
   @ PyCall ~/.julia/packages/PyCall/7a7w0/src/callback.jl:32
 [9] pyjlwrap_call(self_::Ptr{PyCall.PyObject_struct}, args_::Ptr{PyCall.PyObject_struct}, kw_::Ptr{PyCall.PyObject_struct})
   @ PyCall ~/.julia/packages/PyCall/7a7w0/src/callback.jl:44>

dbl001 commented 1 year ago

btw - I showed 'PYSR' yesterday to a UCSC Pure Math PhD student. He commented 'this is going to put me out of a job. ;-)

MilesCranmer commented 1 year ago

Hm, did you restart Python after making that change? Julia has multiple dispatch which means multiple definitions of the same function will persist in the runtime until you exit. Maybe there's a function sitting around that has the old definition.

And, that's awesome to hear! 😎

dbl001 commented 1 year ago

The code is running in Jupyter lab... I restarted the Kernel and ran all the code again. Same 'Assertion' in 'assert_operators_defined_over_reals' in Configure.jl with input at myprime_abs, -100.0

dbl001 commented 1 year ago

% julia 
The latest version of Julia in the `release` channel is 1.8.3+0.x64.apple.darwin14. You currently have `1.8.3+0.x64` installed. Run:

  juliaup update

to install Julia 1.8.3+0.x64.apple.darwin14 and update the `release` channel to that version.
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.3 (2022-11-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> myprime_abs(x::BigFloat) = Convert(BigFloat, prime(round(Int, x)))
myprime_abs (generic function with 1 method)

julia> myprime_abs(-100)
ERROR: MethodError: no method matching myprime_abs(::Int64)
Closest candidates are:
  myprime_abs(::BigFloat) at REPL[1]:1
Stacktrace:
 [1] top-level scope
   @ REPL[2]:1

julia> myprime_abs(10)
ERROR: MethodError: no method matching myprime_abs(::Int64)
Closest candidates are:
  myprime_abs(::BigFloat) at REPL[1]:1
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1

julia>

MilesCranmer commented 1 year ago

Oh, you need to write convert(BigFloat, ..., rather than Convert(BigFloat, .... Maybe that's it.

dbl001 commented 1 year ago

% julia              
The latest version of Julia in the `release` channel is 1.8.3+0.x64.apple.darwin14. You currently have `1.8.3+0.x64` installed. Run:

  juliaup update

to install Julia 1.8.3+0.x64.apple.darwin14 and update the `release` channel to that version.
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.3 (2022-11-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> myprime_abs(x::BigFloat) = convert(BigFloat, prime(round(Int, x)))
myprime_abs (generic function with 1 method)

julia> myprime_abs(-100)
ERROR: MethodError: no method matching myprime_abs(::Int64)
Closest candidates are:
  myprime_abs(::BigFloat) at REPL[1]:1
Stacktrace:
 [1] top-level scope
   @ REPL[2]:1

julia> myprime_abs(10)
ERROR: MethodError: no method matching myprime_abs(::Int64)
Closest candidates are:
  myprime_abs(::BigFloat) at REPL[1]:1
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1

julia>

MilesCranmer commented 1 year ago

10 is an integer, not a BigFloat, so you can't put it into x::BigFloat. SymbolicRegression.jl does not input integers though.

dbl001 commented 1 year ago

So, create a prime() method which takes BigFloat input?

julia> prime(convert(Int, 11))
31

julia> prime(convert(BigFloat, 11))
ERROR: MethodError: no method matching prime(::BigFloat)
Closest candidates are:
  prime(::Type{T}, ::Integer) where T<:Integer at ~/.julia/packages/Primes/auplV/src/Primes.jl:795
  prime(::Integer) at ~/.julia/packages/Primes/auplV/src/Primes.jl:796
Stacktrace:
 [1] top-level scope
   @ REPL[26]:1

MilesCranmer commented 1 year ago

I am saying this:


julia> myprime_abs(x::BigFloat) = convert(BigFloat, prime(round(Int, x)))
myprime_abs (generic function with 1 method)

julia> myprime_abs(-100)
ERROR: MethodError: no method matching myprime_abs(::Int64)
Closest candidates are:
  myprime_abs(::BigFloat) at REPL[1]:1
Stacktrace:
 [1] top-level scope
   @ REPL[2]:1

doesn't actually matter, because this:

julia> myprime_abs(BigFloat(-100))

works fine, and this is what SymbolicRegression.jl is actually going to use.

dbl001 commented 1 year ago

Hmmm... not sure what's going on here:

jjulia> myprime_abs(x::BigFloat) = convert(BigFloat, prime(round(Int, x)))
myprime_abs (generic function with 1 method)

julia> myprime_abs(BigFloat(-100))
ERROR: DomainError with -100:

Stacktrace:
 [1] prime
   @ ~/.julia/packages/Primes/auplV/src/Primes.jl:795 [inlined]
 [2] prime
   @ ~/.julia/packages/Primes/auplV/src/Primes.jl:796 [inlined]
 [3] myprime_abs(x::BigFloat)
   @ Main ./REPL[28]:1
 [4] top-level scope
   @ REPL[29]:1

MilesCranmer commented 1 year ago

Missing the abs?

MilesCranmer commented 1 year ago

Note also that prime isn't defined for 0, so you'll have to catch it. e.g.,

myprime_abs(x::BigFloat) = (abs(x) <= 0.5) ? BigFloat(NaN) : convert(BigFloat, prime(round(Int, abs(x))))

dbl001 commented 1 year ago

Right!

% myprime_abs(BigFloat(0)) NaN

It's training now! Stay tuned...

Cycles per second: 1.360e+02 Head worker occupation: 0.0% Progress: 1 / 150 total iterations (0.667%)

Hall of Fame:

Complexity Loss Score Equation 1 3.353e+19 6.548e-01 7.4556707208519275907808763527632477379015835440821884926351654e+12 3 3.353e+19 -5.000e-11 (x1 - -7.4572865599439574279843300324102069260357278880950195573743473e+12)

MilesCranmer commented 1 year ago

Nice! Okay closing this... I'm afraid I can't help much with your specific use-case, but do raise an issue if there is anything you suspect is an obvious bug.

MilesCranmer commented 1 year ago

Also, since, e.g., prime(10000000) takes a really long time to evaluate, and the package might encounter any integer in Int64, you might want to put in an upper bound on the input, e.g.,

myprime_abs(x::BigFloat) = (abs(x) <= 0.5 || abs(x) > 1000) ? BigFloat(NaN) : convert(BigFloat, prime(round(Int, abs(x))))

would limit x to [-1000, 1000] \ [-0.5, 0.5]; otherwise it will throw a NaN

dbl001 commented 1 year ago

Right!

MilesCranmer commented 1 year ago

Put up a working tutorial for this, since I think it's a nice example! https://astroautomata.com/PySR/examples/#7-julia-packages-and-types

dbl001 commented 1 year ago

Nice! ;-)

dbl001 commented 1 year ago

Is your new Toy primes example missing: pysr.install()? E.g.

pysr.install()

Also, I had to add:

jl.eval("""
    using Random: AbstractRNG

    try
        # In case randn(BigFloat) gets defined in the future
        randn(BigFloat)
    catch e
        if isa(e, MethodError)
            Base.randn(::Type{BigFloat}, args::Integer...) = BigFloat.(randn(Float64, args...))
            Base.randn(rng::AbstractRNG, ::Type{BigFloat}, args::Integer...) = BigFloat.(randn(rng, Float64, args...))
        else
            throw(e)
        end
    end
""")

MilesCranmer commented 1 year ago

BigFloats aren’t used in that example, and on that page it is implicitly assumed that the user already installed PySR.

MilesCranmer commented 1 year ago

BigFloats aren’t used in that example, and on that page it is implicitly assumed that the user already installed PySR.

dbl001 commented 1 year ago

I got this error running the toy example:

/Users/davidlaxer/anaconda3/envs/pysr/lib/python3.9/site-packages/pysr/sr.py:1257: UserWarning: Note: it looks like you are running in Jupyter. The progress bar will be turned off.
  warnings.warn(
/Users/davidlaxer/anaconda3/envs/pysr/lib/python3.9/site-packages/pysr/julia_helpers.py:201: UserWarning: Julia has already started. The new Julia options {'threads': 16} will be ignored.
  warnings.warn(
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [11], line 1
----> 1 model.fit(X, y)

File ~/anaconda3/envs/pysr/lib/python3.9/site-packages/pysr/sr.py:1797, in PySRRegressor.fit(self, X, y, Xresampled, weights, variable_names)
   1794     self._checkpoint()
   1796 # Perform the search:
-> 1797 self._run(X, y, mutated_params, weights=weights, seed=seed)
   1799 # Then, after fit, we save again, so the pickle file contains
   1800 # the equations:
   1801 if not self.temp_equation_file:

File ~/anaconda3/envs/pysr/lib/python3.9/site-packages/pysr/sr.py:1657, in PySRRegressor._run(self, X, y, mutated_params, weights, seed)
   1651 cprocs = (
   1652     None if parallelism in ["serial", "multithreading"] else int(self.procs)
   1653 )
   1655 # Call to Julia backend.
   1656 # See https://github.com/MilesCranmer/SymbolicRegression.jl/blob/master/src/SymbolicRegression.jl
-> 1657 self.raw_julia_state_ = SymbolicRegression.EquationSearch(
   1658     Main.X,
   1659     Main.y,
   1660     weights=Main.weights,
   1661     niterations=int(self.niterations),
   1662     varMap=self.feature_names_in_.tolist(),
   1663     options=options,
   1664     numprocs=cprocs,
   1665     parallelism=parallelism,
   1666     saved_state=self.raw_julia_state_,
   1667     addprocs_function=cluster_manager,
   1668 )
   1670 # Set attributes
   1671 self.equations_ = self.get_hof()

RuntimeError: <PyCall.jlwrap (in a Julia function called from Python)
JULIA: MethodError: no method matching randn(::Random.TaskLocalRNG, ::Type{BigFloat})
Closest candidates are:
  randn(::Random.AbstractRNG, ::Type{T}, !Matched::Tuple{Vararg{Int64, N}} where N) where T at ~/.julia/juliaup/julia-1.8.3+0.x64.apple.darwin14/share/julia/stdlib/v1.8/Random/src/normal.jl:235
  randn(::Random.AbstractRNG, ::Type{T}, !Matched::Integer, !Matched::Integer...) where T at ~/.julia/juliaup/julia-1.8.3+0.x64.apple.darwin14/share/julia/stdlib/v1.8/Random/src/normal.jl:238
  randn(::Random.AbstractRNG) at ~/.julia/juliaup/julia-1.8.3+0.x64.apple.darwin14/share/julia/stdlib/v1.8/Random/src/normal.jl:38
  ...
Stacktrace:
  [1] randn(#unused#::Type{BigFloat})
    @ Random ~/.julia/juliaup/julia-1.8.3+0.x64.apple.darwin14/share/julia/stdlib/v1.8/Random/src/normal.jl:191
  [2] make_random_leaf
    @ ~/.julia/packages/SymbolicRegression/37l4B/src/MutationFunctions.jl:153 [inlined]
  [3] append_random_op(tree::Node{BigFloat}, options::Options{L2DistLoss, Int64, 0.86, 10}, nfeatures::Int64; makeNewBinOp::Nothing)
    @ SymbolicRegression.MutationFunctionsModule ~/.julia/packages/SymbolicRegression/37l4B/src/MutationFunctions.jl:99
  [4] append_random_op
    @ ~/.julia/packages/SymbolicRegression/37l4B/src/MutationFunctions.jl:82 [inlined]
  [5] gen_random_tree(length::Int64, options::Options{L2DistLoss, Int64, 0.86, 10}, nfeatures::Int64, #unused#::Type{BigFloat})
    @ SymbolicRegression.MutationFunctionsModule ~/.julia/packages/SymbolicRegression/37l4B/src/MutationFunctions.jl:243
  [6] #2
    @ ./none:0 [inlined]
  [7] iterate
    @ ./generator.jl:47 [inlined]
  [8] collect
    @ ./array.jl:787 [inlined]
  [9] #Population#1
    @ ~/.julia/packages/SymbolicRegression/37l4B/src/Population.jl:34 [inlined]
 [10] #6
    @ ./none:0 [inlined]
 [11] iterate
    @ ./generator.jl:47 [inlined]
 [12] collect(itr::Base.Generator{UnitRange{Int64}, SymbolicRegression.SearchUtilsModule.var"#6#8"{Int64, Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{BigFloat}}, Options{L2DistLoss, Int64, 0.86, 10}}})
    @ Base ./array.jl:787
 [13] #5
    @ ./none:0 [inlined]
 [14] iterate
    @ ./generator.jl:47 [inlined]
 [15] collect(itr::Base.Generator{UnitRange{Int64}, SymbolicRegression.SearchUtilsModule.var"#5#7"{Int64, Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{BigFloat}}, Options{L2DistLoss, Int64, 0.86, 10}}})
    @ Base ./array.jl:787
 [16] init_dummy_pops
    @ ~/.julia/packages/SymbolicRegression/37l4B/src/SearchUtils.jl:50 [inlined]
 [17] _EquationSearch(::SymbolicRegression.CoreModule.ProgramConstantsModule.SRThreaded, datasets::Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{BigFloat}}; niterations::Int64, options::Options{L2DistLoss, Int64, 0.86, 10}, numprocs::Nothing, procs::Nothing, addprocs_function::Nothing, runtests::Bool, saved_state::Nothing)
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:476
 [18] EquationSearch(datasets::Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{BigFloat}}; niterations::Int64, options::Options{L2DistLoss, Int64, 0.86, 10}, parallelism::String, numprocs::Nothing, procs::Nothing, addprocs_function::Nothing, runtests::Bool, saved_state::Nothing)
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:383
 [19] EquationSearch(X::Matrix{BigFloat}, y::Matrix{BigFloat}; niterations::Int64, weights::Nothing, varMap::Vector{String}, options::Options{L2DistLoss, Int64, 0.86, 10}, parallelism::String, numprocs::Nothing, procs::Nothing, addprocs_function::Nothing, runtests::Bool, saved_state::Nothing, multithreaded::Nothing)
    @ SymbolicRegression ~/.julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:320
 [20] #EquationSearch#21
    @ ~/.julia/packages/SymbolicRegression/37l4B/src/SymbolicRegression.jl:345 [inlined]
 [21] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Any, NTuple{8, Symbol}, NamedTuple{(:weights, :niterations, :varMap, :options, :numprocs, :parallelism, :saved_state, :addprocs_function), Tuple{Nothing, Int64, Vector{String}, Options{L2DistLoss, Int64, 0.86, 10}, Nothing, String, Nothing, Nothing}}})
    @ Base ./essentials.jl:731
 [22] _pyjlwrap_call(f::Function, args_::Ptr{PyCall.PyObject_struct}, kw_::Ptr{PyCall.PyObject_struct})
    @ PyCall ~/.julia/packages/PyCall/7a7w0/src/callback.jl:32
 [23] pyjlwrap_call(self_::Ptr{PyCall.PyObject_struct}, args_::Ptr{PyCall.PyObject_struct}, kw_::Ptr{PyCall.PyObject_struct})
    @ PyCall ~/.julia/packages/PyCall/7a7w0/src/callback.jl:44>

MilesCranmer commented 1 year ago

Recall that you edited your local copy to convert to BigFloats. Install pysr with pip install -U pysr to get the normal one back.

MilesCranmer / PySR

Add Support for Arbitrary Precision Arithmetic with BigFloat #236

Cycles per second: 1.360e+02 Head worker occupation: 0.0% Progress: 1 / 150 total iterations (0.667%)

Hall of Fame: