MilesCranmer / PySR

High-Performance Symbolic Regression in Python and Julia
https://astroautomata.com/PySR
Apache License 2.0
2.19k stars 207 forks source link

Add Constant(s) Declaration to the PySRRegressor? #264

Open dbl001 opened 1 year ago

dbl001 commented 1 year ago

I experimented with PySR on Feynman equation 76 (e.g. qv/(2pi*r)) to see if it could 'learn' the constant 'Pi'? With the full data file but only 1,000 rows PySR generated:

import numpy as np
feynmanTable =np.loadtxt("/Users/davidlaxer/AI-Feynman/example_data/example_II.34.2a.txt")
input = feynmanTable[:,:3]
output = feynmanTable[:,-1]

model = PySRRegressor(
    loss="loss(x, y) = (x - y)^2",
    #loss="L1DistLoss()",
    niterations=1000,
    #niterations=10,
    binary_operators=["+", "*", "^", "-", "/"],
    unary_operators=["sin", "cos", "square", "log", "exp", "sqrt", "abs"],
    extra_sympy_mappings={},
)
model.fit(X=input[:1000], y=output[:1000])

...
PySRRegressor.equations_ = [
       pick      score                                           equation  \
    0         0.000000                                         0.56806177   
    1         0.232499                                   (1.3781261 / x2)   
    2         0.333228                                    (sqrt(x0) / x2)   
    3         0.314004                           ((x0 / x2) * 0.47050688)   
    4  >>>>  15.328202                    ((0.15915495 / (x2 / x1)) * x0)   
    5         0.063520  ((0.38872972 / abs((x2 / -0.40942314) / x1)) *...   
    6         0.033199  abs((abs(abs(0.17996888) / (x2 / x1)) * x0) / ...   

               loss  complexity  
    0  1.959071e-01           1  
    1  1.230565e-01           3  
    2  8.818312e-02           4  
    3  6.441921e-02           5  
    4  3.126860e-15           7  
    5  2.584335e-15          10  
    6  2.418315e-15          12  
]

With the feynman_problem.py interface:

model = PySRRegressor(loss="loss(x, y) = (x - y)^2",
    niterations=1000,
    binary_operators=["+", "*", "^", "-", "/"],
    unary_operators=["sin", "cos", "square", "log", "exp", "sqrt", "abs"],
    extra_sympy_mappings={},)
    model.fit(problem.X, problem.y)
...
problem = problem_list[74]
problem
Feynman Equation: II.34.2a|Form: q*v/(2*pi*r)

run_on_problem(problem)
...
Cycles per second: 6.180e+04
Head worker occupation: 2.1%
Progress: 14908 / 15000 total iterations (99.387%)
==============================
Hall of Fame:
-----------------------------------------
Complexity  Loss       Score     Equation
1           1.443e+02  4.228e-07  5.0480013
2           1.433e+02  7.515e-03  sqrt(x1)
3           6.734e+01  7.548e-01  (x0 / x2)
4           6.425e+01  4.697e-02  exp(4.2469196 - x2)
5           5.363e+01  1.808e-01  ((1.2044919 ^ x0) / x2)
6           2.026e+01  9.733e-01  abs(x1 / (x2 + -0.32283887))
7           5.886e-13  2.303e+01  (((x0 * 0.15915494) / x2) * x1)
18          3.238e-13  5.434e-02  abs((abs(abs(abs(abs(abs(abs(x0) * -0.39923882))) * -0.39864594)) / abs(abs(x2))) * x1)

('complexity                                                      18\n
loss                                                           0.0\n
score                                                     0.054342\n
equation         abs((abs(abs(abs(abs(abs(abs(x0) * -0.39923882...\n
sympy_format     0.159154934683391*Abs(x1*Abs(Abs(Abs(Abs(Abs(A...\nl
ambda_format    PySRFunction(X=>0.159154934683391*Abs(x1*Abs(A...\n
Name: 7, dtype: object',
 'q*v/(2*pi*r)',
 {'time': 467.58346700668335,
  'problem': Feynman Equation: II.34.2a|Form: q*v/(2*pi*r)})

I know I can add the number Pi as an additional column to the input data file. Do you think there would be any advantage(s) do allowing constants to be specified in the PySRRegressor?

MilesCranmer commented 1 year ago

I like that idea! Feel free to make a PR. Maybe something of the following form?

model = PySRRegressor(
    complexity_of_constants=100  # to prevent PySR finding scalars
)

model.fit(X, y, constants={"pi": 3.14, "one": 1, "two": 2})

If not I could eventually work on this but might take some time.

I think when passing constants like this, they would basically be added as additional columns to the input data X. Previously you would have to manually add columns to X with the constant value and set the variable_names.