cavalab / srbench

A living benchmark framework for symbolic regression
https://cavalab.org/srbench/
GNU General Public License v3.0
203 stars 75 forks source link

Add PySR & SymbolicRegression.jl to suite #62

Closed MilesCranmer closed 2 years ago

MilesCranmer commented 2 years ago

This adds the install script and the scikit-learn API. I also added a Julia install step to the GitHub action.

I wasn't sure what the tuned directory was for so I have left it for now. I also wasn't sure how you typically deal with choices of operators so I have added some different choices to the hyperparameter search.

Let me know what else I need to add. Thanks! Cheers, Miles

lacava commented 2 years ago

Hi @MilesCranmer , thanks for this PR! Looks like there's a TypeError being thrown:

ml = 'PySRRegressor'

    @pytest.mark.parametrize("ml", MLs)
    def test_evaluate_model(ml):
        print('running test_evaluate_model with ml=',ml)
        dataset = 'test/192_vineyard_small.tsv.gz'
        results_path = 'tmp_results'
        random_state = 42

        algorithm = importlib.__import__('methods.'+ml,globals(),
                                         locals(),
                                       ['est','hyper_params','complexity'])

        print('algorithm imported:',algorithm)
        evaluate_model(dataset,
                       results_path,
                       random_state,
                       ml,
                       algorithm.est,
                       algorithm.hyper_params,
                       algorithm.complexity,
                       algorithm.model,
>                      test=True # testing
                      )

test_evaluate_model.py:38: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
evaluate_model.py:139: in evaluate_model
    print('training',grid_est)
/usr/share/miniconda/envs/srbench/lib/python3.7/site-packages/sklearn/base.py:260: in __repr__
    repr_ = pp.pformat(self)
/usr/share/miniconda/envs/srbench/lib/python3.7/pprint.py:144: in pformat
    self._format(object, sio, 0, 0, {}, 0)
/usr/share/miniconda/envs/srbench/lib/python3.7/pprint.py:161: in _format
    rep = self._repr(object, context, level)
/usr/share/miniconda/envs/srbench/lib/python3.7/pprint.py:393: in _repr
    self._depth, level)
/usr/share/miniconda/envs/srbench/lib/python3.7/site-packages/sklearn/utils/_pprint.py:181: in format
    changed_only=self._changed_only)
/usr/share/miniconda/envs/srbench/lib/python3.7/site-packages/sklearn/utils/_pprint.py:437: in _safe_repr
    v, context, maxlevels, level, changed_only=changed_only)
/usr/share/miniconda/envs/srbench/lib/python3.7/site-packages/sklearn/utils/_pprint.py:446: in _safe_repr
    rep = repr(object)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[TypeError("'float' object is not subscriptable") raised in repr()] PySRRegressor object at 0x7fcf7094a750>

    def __repr__(self):
>       return f"PySRRegressor(equations={self.get_best()['sympy_format']})"
E       TypeError: 'float' object is not subscriptable

methods/PySRRegressor.py:48: TypeError
----------------------------- Captured stdout call -----------------------------
running test_evaluate_model with ml= PySRRegressor
algorithm imported: <module 'methods.PySRRegressor' from '/home/runner/work/srbench/srbench/experiment/methods/PySRRegressor.py'>
========================================
Evaluating PySRRegressor on 
test/192_vineyard_small.tsv.gz
========================================
compression: gzip
filename: test/192_vineyard_small.tsv.gz
scaling X
scaling y
X_train: (15, 3)
y_train: (15,)
test mode enabled
hyper_params set to {}
training 
=============================== warnings summary ===============================

also, OperonRegressor (@foolnotion ) is failing - is this something new @foolnotion ?

foolnotion commented 2 years ago

@lacava yes, there have been some major changes, sorry about that. I will PR an updated install script.

MilesCranmer commented 2 years ago

Thanks, just fixed!

lacava commented 2 years ago

hm, still seeing a type error...

folivetti commented 2 years ago

it seems your PySRRegressor isn't sklearn compatible. Have a look at this code https://gist.github.com/folivetti/609bc9b854c51968ef90aa675ccaa60d

I guess you just need to import from sklearn.base import BaseEstimator, RegressorMixin and then change class PySRRegressor: to class PySRRegressor(BaseEstimator, RegressorMixin):

MilesCranmer commented 2 years ago

Fixed!

MilesCranmer commented 2 years ago

It should actually work now. Sorry for the delay. The way PySR was being ran inside the srbench test changed the definition of stdin within julia (called with PyJulia) and it caused some type errors. Fixed it now and seems to be running.

MilesCranmer commented 2 years ago

Here's it working (using the PR #65 for quicker CI): https://github.com/MilesCranmer/srbench/runs/4909749492?check_suite_focus=true Screen Shot 2022-01-22 at 7 50 22 PM

I reduced the number of possible hyperparams since it was fairly slow. Will the hyperparams I give affect the benchmark, or will you use different ones? The hyperparams I listed were for speedy testing but may not have great performance.

MilesCranmer commented 2 years ago

@lacava I've been trying to get julia built directly inside the conda environment but it's much more difficult than I thought. Is it okay if I switch back to building it from the github action as before? It seemed to work with that.

MilesCranmer commented 2 years ago

Nevermind, seems like its working directly from conda now! Ready for review.

lacava commented 2 years ago

I wasn't sure what the tuned directory was for so I have left it for now. I also wasn't sure how you typically deal with choices of operators so I have added some different choices to the hyperparameter search.

we'll populate the tuned directory for methods that are tested on PMLB. check out issue #24 for a discussion on setting the hyperparameters.

MilesCranmer commented 2 years ago

Thanks, will read in-depth soon. Just so I understand, you are limiting methods to 500k evaluations, as in 500k total equations tried on a dataset? PySR can do 500k evaluations in ~1 second on a 4-core laptop... would this mean that PySR's benchmark will only allow for 1 second of search time? (PySR's algorithm is very evaluation-oriented)

Maybe there is a fixed-wall clock time test where PySR can be benchmarked as well?

(The reason it can get to this speed is basically because I optimized the equation evaluation for like six months... I even fuse subtrees of operators into a single compiled kernel to get a speedup! At one point I even tried automatically caching parts of equations too.)

gkronber commented 2 years ago

PySR can do 500k evaluations in ~1 second on a 4-core laptop...

This sounds impressive and would be at least an order of magnitude faster than operon.

I assume the number of evaluations per second depends on the number of rows in the dataset and on the avg. size of individuals. Please correcty me if I'm wrong but assuming 200 rows and an average size of 50 operations for an individual this means that PySR reaches approximately 5e5 200 50 = 5 GFLOPS on your notebook.

MilesCranmer commented 2 years ago

Here's an example with the equation x1*x2 + cos(x3), with 200 rows, using PySR's julia API SymbolicRegression.jl.

using BenchmarkTools
using SymbolicRegression

# 200 rows, 3 features:
X = randn(3, 200)

# Enable + * / - cos sin operators:
options = Options(binary_operators=(+, *, /, -), unary_operators=(cos, sin))

# Create equation
tree = Node("x1") * Node("x2") + cos(Node("x3"))

# Set up evaluation function:
testfunc() = evalTreeArray(tree, X, options)

# Evaluate performance:
@btime testfunc()

This gives 1.817 us on my laptop.

So 4x cores would be equivalent to 2,201,430 evaluations per second. In practice despite the optimization, the equation eval is still the most expensive part so the mutation parts shouldn't add much.

(The equation itself isn't compiled, but the operations used to evaluate the tree are, so this performance be similar for a random tree)

lacava commented 2 years ago

This discussion should probably be elsewhere than this PR, but in short, our goal with SRBench has been to benchmark methods, not implementations (to the extent possible). We do measure and compare running time alongside accuracy and complexity and report them all together. FWIW, the synthetic benchmark upped the evals to 1M, but removed the tuning step (hence the `tuned' folder).

MilesCranmer commented 2 years ago

I agree this makes sense in principle, but often choices in the implementation inform what methods are even possible, so I don't think these should be disentangled. For example, in PySR there are several choices made in designing the algorithm to increase evaluation speed, which changed the underlying evolutionary algorithm*. Maybe the best compromise would be to have separate benchmarks - one for raw wall clock performance, and one for sample efficiency? From a user's perspective, I think the wall clock benchmark would be much more useful in terms of what tool to use.

But yes, maybe best left to separate thread.

* for example, it's often faster to split a population of individuals into small fixed-size groups, then operate on the groups independently (in parallel) before merging, even though this changes the algorithm itself.

lacava commented 2 years ago

the methods all seem to be passing the tests, but then this is happening at the end:

free(): invalid pointer
/home/runner/work/_temp/32bbc723-8bfa-4eae-a161-16dc7a15da38.sh: line 4: 12210 Aborted                 (core dumped) python -m pytest -v test_evaluate_model.py
Error: Process completed with exit code 134.
MilesCranmer commented 2 years ago

Hm, this is interesting. So this is obviously a C function, yet PySR has no C dependencies in the main search code which is pure Julia. The other PySR deps are very stable libraries like numpy/sympy which I wouldn't expect to give such an issue.

However, I know that PyJulia directly links into the Python binary which can change some behaviour, like signals.

Does this script run all algorithms in one single kernel evaluation, or does it start a new Python kernel for each algorithm? I wonder if there is some interaction between algorithms.

MilesCranmer commented 2 years ago

Edited test_evaluate_model.py to take a single algorithm as an argument. Then, test_evaluate_model.sh to replicate the loop over all *.py files except __init__.py in the methods folder. This will re-launch the python kernel each algorithm, which should prevent interference.

lacava commented 2 years ago

I reduced the number of possible hyperparams since it was fairly slow. Will the hyperparams I give affect the benchmark, or will you use different ones? The hyperparams I listed were for speedy testing but may not have great performance.

The hyperparams you give will affect the benchmark (whenever we fix them and run it). For v2.0, we limited to six combos.

However, many hyperparameters shouldn't affect the speed of the tests. When you call evaluate_model.py with the tests=True flag, it skips tuning. You can edit that conditional if you want to further restrict settings for the test. (Probably should handle that better in the future)

lacava commented 2 years ago

Does this script run all algorithms in one single kernel evaluation, or does it start a new Python kernel for each algorithm? I wonder if there is some interaction between algorithms.

For tests, pytest is a consistent kernel I imagine. I guess that is why you turned it into a script. For the experiment, new kernel. we're running batch scripts to evaluate in analyze.py