cavalab / srbench

A living benchmark framework for symbolic regression
https://cavalab.org/srbench/
GNU General Public License v3.0
216 stars 75 forks source link

Update PySR from 0.7 to 0.16 #146

Closed MilesCranmer closed 1 year ago

MilesCranmer commented 1 year ago

This updates PySR from 0.7 to 0.15. Fixes #145.

Edit: changed to 0.16

MilesCranmer commented 1 year ago

FYI the build errors look to be from other packages rather than PySR

folivetti commented 1 year ago

FYI the build errors look to be from other packages rather than PySR

If I traced back correctly, the issue is in here

https://github.com/lacava/AI-Feynman/blob/master/setup.py

@lacava I think changing this file should fix it.

MilesCranmer commented 1 year ago

Ping on this. Let me know how I can help.

lacava commented 1 year ago

hi sorry, catching up from vacation. i'm seeing errors from the cache actions... just updated them to v3 and testing now.

lacava commented 1 year ago

@MilesCranmer

=================================== FAILURES ===================================
______________________ test_evaluate_model[PySRRegressor] ______________________

ml = 'PySRRegressor'

    def test_evaluate_model(ml):
        print('running test_evaluate_model with ml=',ml)
        dataset = 'test/192_vineyard_small.tsv.gz'
        results_path = 'tmp_results'
        random_state = 42

        algorithm = importlib.__import__('methods.'+ml,globals(),
                                         locals(),
                                       ['est','hyper_params','complexity'])

        print('algorithm imported:',algorithm)
        evaluate_model(dataset,
                       results_path,
                       random_state,
                       ml,
                       algorithm.est,
                       algorithm.hyper_params,
                       algorithm.complexity,
                       algorithm.model,
>                      test=True # testing
                      )

test_evaluate_model.py:30: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
evaluate_model.py:144: in evaluate_model
    grid_est.fit(X_train_scaled, y_train_scaled)
/usr/share/miniconda3/envs/srbench/lib/python3.7/site-packages/sklearn/model_selection/_search_successive_halving.py:262: in fit
    super().fit(X, y=y, groups=groups, **fit_params)
/usr/share/miniconda3/envs/srbench/lib/python3.7/site-packages/sklearn/model_selection/_search.py:926: in fit
    self.best_estimator_.fit(X, y, **fit_params)
/usr/share/miniconda3/envs/srbench/lib/python3.7/site-packages/pysr/sr.py:1910: in fit
    X, y, Xresampled, weights, variable_names, X_units, y_units
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = PySRRegressor.equations_ = None
X = array([[-1.61280322,  0.89761968, -0.11108968],
       [-1.8941061 , -1.17381036, -1.9857281 ],
       [ 0.3[56](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:57)31699, -...7440595,  0.51378979],
       [ 0.77827132, -0.65595285, -0.73596916],
       [-1.19084889,  1.41547719,  0.51378979]])
y = array([ 0.7339187 , -1.92754472, -0.59681301, -1.32266667, -0.112910[57](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:58),
       -0.35486179,  0.85489431, -0.11291057, -1.56461789,  1.09684553,
        0.85489431, -0.35486179,  1.[58](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:59)074797,  0.[61](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:62)294309,  0.61294309])
Xresampled = None, weights = None, variable_names = None, X_units = None
y_units = None

    def _validate_and_set_fit_params(
        self, X, y, Xresampled, weights, variable_names, X_units, y_units
    ):
        """
        Validate the parameters passed to the :term`fit` method.

        This method also sets the `nout_` attribute.

        Parameters
        ----------
        X : ndarray | pandas.DataFrame
            Training data of shape `(n_samples, n_features)`.
        y : ndarray | pandas.DataFrame}
            Target values of shape `(n_samples,)` or `(n_samples, n_targets)`.
            Will be cast to `X`'s dtype if necessary.
        Xresampled : ndarray | pandas.DataFrame
            Resampled training data used for denoising,
            of shape `(n_resampled, n_features)`.
        weights : ndarray | pandas.DataFrame
            Weight array of the same shape as `y`.
            Each element is how to weight the mean-square-error loss
            for that particular element of y.
        variable_names : list[str] of length n_features
            Names of each variable in the training dataset, `X`.
        X_units : list[str] of length n_features
            Units of each variable in the training dataset, `X`.
        y_units : str | list[str] of length n_out
            Units of each variable in the training dataset, `y`.

        Returns
        -------
        X_validated : ndarray of shape (n_samples, n_features)
            Validated training data.
        y_validated : ndarray of shape (n_samples,) or (n_samples, n_targets)
            Validated target data.
        Xresampled : ndarray of shape (n_resampled, n_features)
            Validated resampled training data used for denoising.
        variable_names_validated : list[str] of length n_features
            Validated list of variable names for each feature in `X`.
        X_units : list[str] of length n_features
            Validated units for `X`.
        y_units : str | list[str] of length n_out
            Validated units for `y`.

        """
        if isinstance(X, pd.DataFrame):
            if variable_names:
                variable_names = None
                warnings.warn(
                    "`variable_names` has been reset to `None` as `X` is a DataFrame. "
                    "Using DataFrame column names instead."
                )

            if (
                pd.api.types.is_object_dtype(X.columns)
                and X.columns.str.contains(" ").any()
            ):
                X.columns = X.columns.str.replace(" ", "_")
                warnings.warn(
                    "Spaces in DataFrame column names are not supported. "
                    "Spaces have been replaced with underscores. \n"
                    "Please rename the columns to valid names."
                )
        elif variable_names and any([" " in name for name in variable_names]):
            variable_names = [name.replace(" ", "_") for name in variable_names]
            warnings.warn(
                "Spaces in `variable_names` are not supported. "
                "Spaces have been replaced with underscores. \n"
                "Please use valid names instead."
            )

        # Data validation and feature name fetching via sklearn
        # This method sets the n_features_in_ attribute
        if Xresampled is not None:
            Xresampled = check_array(Xresampled)
        if weights is not None:
            weights = check_array(weights, ensure_2d=False)
            check_consistent_length(weights, y)
        X, y = self._validate_data(X=X, y=y, reset=True, multi_output=True)
        self.feature_names_in_ = _check_feature_names_in(
>           self, variable_names, generate_names=False
        )
E       TypeError: _check_feature_names_in() got an unexpected keyword argument 'generate_names'

/usr/share/miniconda3/envs/srbench/lib/python3.7/site-packages/pysr/sr.py:1442: TypeError
----------------------------- Captured stdout call -----------------------------
running test_evaluate_model with ml= PySRRegressor
algorithm imported: <module 'methods.PySRRegressor' from '/home/runner/work/srbench/srbench/experiment/methods/PySRRegressor.py'>
========================================
Evaluating PySRRegressor on 
test/1[92](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:93)_vineyard_small.tsv.gz
========================================
compression: gzip
filename: test/[192](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:193)_vineyard_small.tsv.gz
scaling X
scaling y
X_train: (15, 3)
y_train: (15,)
test mode enabled
hyper_params set to {}
setting niterations =2 for test
setting population_size =20 for test
training HalvingGridSearchCV(cv=KFold(n_splits=2, random_state=42, shuffle=True),
                    error_score=0.0,
                    ('estimator', PySRRegressor.equations_ = None),
                    n_jobs=1, param_grid={}, scoring='r2', verbose=2)
n_iterations: 1
n_required_iterations: 1
n_possible_iterations: 1
min_resources_: 15
max_resources_: 15
aggressive_elimination: False
factor: 3
----------
iter: 0
n_candidates: 1
n_resources: 15
Fitting 2 folds for each of 1 candidates, totalling 2 fits
[CV] END .................................................... total time=   0.0s
[CV] END .................................................... total time=   0.0s
=========================== short test summary info ============================
FAILED test_evaluate_model.py::test_evaluate_model[PySRRegressor] - TypeError: _check_feature_names_in() got an unexpected keyword argument 'generate_names'
============================== 1 failed in 2.13s ===============================
MilesCranmer commented 1 year ago

Could you rerun the test? 0.16 fixed that (just pushed)

MilesCranmer commented 1 year ago

The alternative fix is to upgrade the scikit learn version. But 0.16 avoids that requirement

MilesCranmer commented 1 year ago

The log shows this:

Collecting sklearn
  Downloading sklearn-0.0.post7.tar.gz (3.6 kB)

I think one of the packages is trying to install sklearn rather than scikit-learn:

      cwd: /tmp/pip-install-q99j2_z7/sklearn_aefd367c6cbf48d7b1174e30ed4e91ff/
    Complete output (18 lines):
    The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
    rather than 'sklearn' for pip commands.
MilesCranmer commented 1 year ago

One of aifeynman, bsr, DistanceClassifier, or eigency is still installing sklearn rather than scikit-learn:

Building wheels for collected packages: aifeynman, bsr, DistanceClassifier, eigency, sklearn
  Building wheel for aifeynman (setup.py): started
  Building wheel for aifeynman (setup.py): finished with status 'done'
  Created wheel for aifeynman: filename=aifeynman-2.0.7.6-cp37-cp37m-linux_x86_64.whl size=559089 sha256=d03f785ff244c97a07d6ab940bb85279bde93fb1b2d70f43314299e265a4d755
  Stored in directory: /tmp/pip-ephem-wheel-cache-gxnhbwh9/wheels/7d/ba/ce/50485bb4b52a13cf8d7f2b35f1c7803de401c66bc647dd7dcc
  Building wheel for bsr (setup.py): started
  Building wheel for bsr (setup.py): finished with status 'done'
  Created wheel for bsr: filename=bsr-0.1.1.1-py3-none-any.whl size=15000 sha256=b6767a5d35b75e17b7cafafe1ad3b7b00d142e7d890f8fec9cdffcd0d00bb329
  Stored in directory: /tmp/pip-ephem-wheel-cache-gxnhbwh9/wheels/ad/a7/b6/db926ef90244bb33b971a74dc07dfc0d832f9f7763a7e7d0d9
  Building wheel for DistanceClassifier (setup.py): started
  Building wheel for DistanceClassifier (setup.py): finished with status 'done'
  Created wheel for DistanceClassifier: filename=DistanceClassifier-0.0.8-py3-none-any.whl size=5832 sha256=a52d93c87eea3012283918b949d93060c1a2324bde5ea7284cf3cf0a42c86e1f
  Stored in directory: /home/runner/.cache/pip/wheels/f0/0f/18/a1f15ff08c5ccb370f80f83a2813c44c56dcca9ab2f2743741
  Building wheel for eigency (setup.py): started
  Building wheel for eigency (setup.py): finished with status 'done'
  Created wheel for eigency: filename=eigency-1.77-cp37-cp37m-linux_x86_64.whl size=982053 sha256=39006f4c632c00db5f71f2aade1339f6fe934989f8469db66d9f55fa34995488
  Stored in directory: /home/runner/.cache/pip/wheels/89/49/a1/a68041d54e02a621df31e2bb71220bfaaefecbfec5bf3ac9c1
  Building wheel for sklearn (setup.py): started
  Building wheel for sklearn (setup.py): finished with status 'done'
  Created wheel for sklearn: filename=sklearn-0.0.post7-py3-none-any.whl size=2361 sha256=fd2205f918d7f698db981e39e439a98b6c67cb8a2c8d2e4b5a37e73ee7ce9572
  Stored in directory: /home/runner/.cache/pip/wheels/ae/cd/c1/7044aa9eba19c0e761bd045ad4d91b9939538ed908b4d5d789
MilesCranmer commented 1 year ago

@lacava The culprit is ffx: https://github.com/natekupp/ffx/blob/master/setup.py

    install_requires=['click>=5.0', 'contextlib2>=0.5.4', 'numpy', 'pandas', 'six', 'sklearn',],
folivetti commented 1 year ago

pigbacking on this error. In the refactoring of the repo, should I mark the algorithms without any activity in their repo for a long time as unmaintained? We could maybe split the installation into only active project and legay. @lacava

lacava commented 1 year ago

pigbacking on this error. In the refactoring of the repo, should I mark the algorithms without any activity in their repo for a long time as unmaintained? We could maybe split the installation into only active project and legay. @lacava

i'm working on separate environments for each method on this branch: https://github.com/cavalab/srbench/tree/separate-envs

@lacava The culprit is ffx: https://github.com/natekupp/ffx/blob/master/setup.py

good catch! i was looking all over 😆

MilesCranmer commented 1 year ago

@lacava it looks like the other remaining culprit is AI-Feynman:

https://github.com/lacava/AI-Feynman/blob/d53c055ca17d0684fd7ba7d72df8856668e2e132/setup.py#L47

MilesCranmer commented 1 year ago

With AI-Feynman commented out, and also the setuptools requirement loosened to any version (some packages require a more recent one), I can confirm that the environment builds!

lacava commented 1 year ago

@MilesCranmer could you re-check why the test for PySR is failing? It seems to have to do with the call to pysr.install(). If we can get it to build and test i'm happy to merge despite the problems with other methods. https://github.com/cavalab/srbench/actions/runs/5882205176/job/15953024910?pr=146

MilesCranmer commented 1 year ago

Done. I just switched to the conda-forge version of PySR, which also installs the conda-forge version of Julia.

(I was using the pip version only so that people on Windows and Apple Silicon would work, but my guess is that they'd run into other issues anyways so it's not worth it.)

MilesCranmer commented 1 year ago

Argh. Tensorflow 1.14 requires Python < 3.8. But conda-forge dropped support for Python 3.7 about a year ago, meaning any recent conda-forge package can’t be installed unless we bump tensorflow. What do you recommend?

MilesCranmer commented 1 year ago

Trying to install tensorflow==1.14 via pip instead...

MilesCranmer commented 1 year ago

Probably the easiest thing to do would be to have a separate conda environment per algorithm. I remember now that I actually did that for the PySR paper's fork of srbench: https://github.com/MilesCranmer/pysr_paper/blob/main/benchmark/build_container.md – otherwise it was just too hard to build everything in a single env

lacava commented 1 year ago

ah, that explains why I can't get the environment to resolve for DSR on the new separate environments branch.

Honestly I think it makes more sense to put energy into resolving the remaining errors on the separate-envs branch. every method has its own environment so we don't get these kinds of issues. the PySR build setup there needs to be updated to use the conda package, and there are generally a number of updates being introduced, including the requirement that algs return sympy compatible models. current tests

MilesCranmer commented 1 year ago

Sounds good. Moving to #148