Closed MilesCranmer closed 1 year ago
FYI the build errors look to be from other packages rather than PySR
FYI the build errors look to be from other packages rather than PySR
If I traced back correctly, the issue is in here
https://github.com/lacava/AI-Feynman/blob/master/setup.py
@lacava I think changing this file should fix it.
Ping on this. Let me know how I can help.
hi sorry, catching up from vacation. i'm seeing errors from the cache actions... just updated them to v3 and testing now.
@MilesCranmer
=================================== FAILURES ===================================
______________________ test_evaluate_model[PySRRegressor] ______________________
ml = 'PySRRegressor'
def test_evaluate_model(ml):
print('running test_evaluate_model with ml=',ml)
dataset = 'test/192_vineyard_small.tsv.gz'
results_path = 'tmp_results'
random_state = 42
algorithm = importlib.__import__('methods.'+ml,globals(),
locals(),
['est','hyper_params','complexity'])
print('algorithm imported:',algorithm)
evaluate_model(dataset,
results_path,
random_state,
ml,
algorithm.est,
algorithm.hyper_params,
algorithm.complexity,
algorithm.model,
> test=True # testing
)
test_evaluate_model.py:30:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
evaluate_model.py:144: in evaluate_model
grid_est.fit(X_train_scaled, y_train_scaled)
/usr/share/miniconda3/envs/srbench/lib/python3.7/site-packages/sklearn/model_selection/_search_successive_halving.py:262: in fit
super().fit(X, y=y, groups=groups, **fit_params)
/usr/share/miniconda3/envs/srbench/lib/python3.7/site-packages/sklearn/model_selection/_search.py:926: in fit
self.best_estimator_.fit(X, y, **fit_params)
/usr/share/miniconda3/envs/srbench/lib/python3.7/site-packages/pysr/sr.py:1910: in fit
X, y, Xresampled, weights, variable_names, X_units, y_units
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = PySRRegressor.equations_ = None
X = array([[-1.61280322, 0.89761968, -0.11108968],
[-1.8941061 , -1.17381036, -1.9857281 ],
[ 0.3[56](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:57)31699, -...7440595, 0.51378979],
[ 0.77827132, -0.65595285, -0.73596916],
[-1.19084889, 1.41547719, 0.51378979]])
y = array([ 0.7339187 , -1.92754472, -0.59681301, -1.32266667, -0.112910[57](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:58),
-0.35486179, 0.85489431, -0.11291057, -1.56461789, 1.09684553,
0.85489431, -0.35486179, 1.[58](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:59)074797, 0.[61](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:62)294309, 0.61294309])
Xresampled = None, weights = None, variable_names = None, X_units = None
y_units = None
def _validate_and_set_fit_params(
self, X, y, Xresampled, weights, variable_names, X_units, y_units
):
"""
Validate the parameters passed to the :term`fit` method.
This method also sets the `nout_` attribute.
Parameters
----------
X : ndarray | pandas.DataFrame
Training data of shape `(n_samples, n_features)`.
y : ndarray | pandas.DataFrame}
Target values of shape `(n_samples,)` or `(n_samples, n_targets)`.
Will be cast to `X`'s dtype if necessary.
Xresampled : ndarray | pandas.DataFrame
Resampled training data used for denoising,
of shape `(n_resampled, n_features)`.
weights : ndarray | pandas.DataFrame
Weight array of the same shape as `y`.
Each element is how to weight the mean-square-error loss
for that particular element of y.
variable_names : list[str] of length n_features
Names of each variable in the training dataset, `X`.
X_units : list[str] of length n_features
Units of each variable in the training dataset, `X`.
y_units : str | list[str] of length n_out
Units of each variable in the training dataset, `y`.
Returns
-------
X_validated : ndarray of shape (n_samples, n_features)
Validated training data.
y_validated : ndarray of shape (n_samples,) or (n_samples, n_targets)
Validated target data.
Xresampled : ndarray of shape (n_resampled, n_features)
Validated resampled training data used for denoising.
variable_names_validated : list[str] of length n_features
Validated list of variable names for each feature in `X`.
X_units : list[str] of length n_features
Validated units for `X`.
y_units : str | list[str] of length n_out
Validated units for `y`.
"""
if isinstance(X, pd.DataFrame):
if variable_names:
variable_names = None
warnings.warn(
"`variable_names` has been reset to `None` as `X` is a DataFrame. "
"Using DataFrame column names instead."
)
if (
pd.api.types.is_object_dtype(X.columns)
and X.columns.str.contains(" ").any()
):
X.columns = X.columns.str.replace(" ", "_")
warnings.warn(
"Spaces in DataFrame column names are not supported. "
"Spaces have been replaced with underscores. \n"
"Please rename the columns to valid names."
)
elif variable_names and any([" " in name for name in variable_names]):
variable_names = [name.replace(" ", "_") for name in variable_names]
warnings.warn(
"Spaces in `variable_names` are not supported. "
"Spaces have been replaced with underscores. \n"
"Please use valid names instead."
)
# Data validation and feature name fetching via sklearn
# This method sets the n_features_in_ attribute
if Xresampled is not None:
Xresampled = check_array(Xresampled)
if weights is not None:
weights = check_array(weights, ensure_2d=False)
check_consistent_length(weights, y)
X, y = self._validate_data(X=X, y=y, reset=True, multi_output=True)
self.feature_names_in_ = _check_feature_names_in(
> self, variable_names, generate_names=False
)
E TypeError: _check_feature_names_in() got an unexpected keyword argument 'generate_names'
/usr/share/miniconda3/envs/srbench/lib/python3.7/site-packages/pysr/sr.py:1442: TypeError
----------------------------- Captured stdout call -----------------------------
running test_evaluate_model with ml= PySRRegressor
algorithm imported: <module 'methods.PySRRegressor' from '/home/runner/work/srbench/srbench/experiment/methods/PySRRegressor.py'>
========================================
Evaluating PySRRegressor on
test/1[92](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:93)_vineyard_small.tsv.gz
========================================
compression: gzip
filename: test/[192](https://github.com/cavalab/srbench/actions/runs/5834442301/job/15824654155#step:5:193)_vineyard_small.tsv.gz
scaling X
scaling y
X_train: (15, 3)
y_train: (15,)
test mode enabled
hyper_params set to {}
setting niterations =2 for test
setting population_size =20 for test
training HalvingGridSearchCV(cv=KFold(n_splits=2, random_state=42, shuffle=True),
error_score=0.0,
('estimator', PySRRegressor.equations_ = None),
n_jobs=1, param_grid={}, scoring='r2', verbose=2)
n_iterations: 1
n_required_iterations: 1
n_possible_iterations: 1
min_resources_: 15
max_resources_: 15
aggressive_elimination: False
factor: 3
----------
iter: 0
n_candidates: 1
n_resources: 15
Fitting 2 folds for each of 1 candidates, totalling 2 fits
[CV] END .................................................... total time= 0.0s
[CV] END .................................................... total time= 0.0s
=========================== short test summary info ============================
FAILED test_evaluate_model.py::test_evaluate_model[PySRRegressor] - TypeError: _check_feature_names_in() got an unexpected keyword argument 'generate_names'
============================== 1 failed in 2.13s ===============================
Could you rerun the test? 0.16 fixed that (just pushed)
The alternative fix is to upgrade the scikit learn version. But 0.16 avoids that requirement
The log shows this:
Collecting sklearn
Downloading sklearn-0.0.post7.tar.gz (3.6 kB)
I think one of the packages is trying to install sklearn
rather than scikit-learn
:
cwd: /tmp/pip-install-q99j2_z7/sklearn_aefd367c6cbf48d7b1174e30ed4e91ff/
Complete output (18 lines):
The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
rather than 'sklearn' for pip commands.
One of aifeynman, bsr, DistanceClassifier, or eigency is still installing sklearn
rather than scikit-learn
:
Building wheels for collected packages: aifeynman, bsr, DistanceClassifier, eigency, sklearn
Building wheel for aifeynman (setup.py): started
Building wheel for aifeynman (setup.py): finished with status 'done'
Created wheel for aifeynman: filename=aifeynman-2.0.7.6-cp37-cp37m-linux_x86_64.whl size=559089 sha256=d03f785ff244c97a07d6ab940bb85279bde93fb1b2d70f43314299e265a4d755
Stored in directory: /tmp/pip-ephem-wheel-cache-gxnhbwh9/wheels/7d/ba/ce/50485bb4b52a13cf8d7f2b35f1c7803de401c66bc647dd7dcc
Building wheel for bsr (setup.py): started
Building wheel for bsr (setup.py): finished with status 'done'
Created wheel for bsr: filename=bsr-0.1.1.1-py3-none-any.whl size=15000 sha256=b6767a5d35b75e17b7cafafe1ad3b7b00d142e7d890f8fec9cdffcd0d00bb329
Stored in directory: /tmp/pip-ephem-wheel-cache-gxnhbwh9/wheels/ad/a7/b6/db926ef90244bb33b971a74dc07dfc0d832f9f7763a7e7d0d9
Building wheel for DistanceClassifier (setup.py): started
Building wheel for DistanceClassifier (setup.py): finished with status 'done'
Created wheel for DistanceClassifier: filename=DistanceClassifier-0.0.8-py3-none-any.whl size=5832 sha256=a52d93c87eea3012283918b949d93060c1a2324bde5ea7284cf3cf0a42c86e1f
Stored in directory: /home/runner/.cache/pip/wheels/f0/0f/18/a1f15ff08c5ccb370f80f83a2813c44c56dcca9ab2f2743741
Building wheel for eigency (setup.py): started
Building wheel for eigency (setup.py): finished with status 'done'
Created wheel for eigency: filename=eigency-1.77-cp37-cp37m-linux_x86_64.whl size=982053 sha256=39006f4c632c00db5f71f2aade1339f6fe934989f8469db66d9f55fa34995488
Stored in directory: /home/runner/.cache/pip/wheels/89/49/a1/a68041d54e02a621df31e2bb71220bfaaefecbfec5bf3ac9c1
Building wheel for sklearn (setup.py): started
Building wheel for sklearn (setup.py): finished with status 'done'
Created wheel for sklearn: filename=sklearn-0.0.post7-py3-none-any.whl size=2361 sha256=fd2205f918d7f698db981e39e439a98b6c67cb8a2c8d2e4b5a37e73ee7ce9572
Stored in directory: /home/runner/.cache/pip/wheels/ae/cd/c1/7044aa9eba19c0e761bd045ad4d91b9939538ed908b4d5d789
@lacava The culprit is ffx: https://github.com/natekupp/ffx/blob/master/setup.py
install_requires=['click>=5.0', 'contextlib2>=0.5.4', 'numpy', 'pandas', 'six', 'sklearn',],
pigbacking on this error. In the refactoring of the repo, should I mark the algorithms without any activity in their repo for a long time as unmaintained? We could maybe split the installation into only active project and legay. @lacava
pigbacking on this error. In the refactoring of the repo, should I mark the algorithms without any activity in their repo for a long time as unmaintained? We could maybe split the installation into only active project and legay. @lacava
i'm working on separate environments for each method on this branch: https://github.com/cavalab/srbench/tree/separate-envs
@lacava The culprit is ffx: https://github.com/natekupp/ffx/blob/master/setup.py
good catch! i was looking all over 😆
@lacava it looks like the other remaining culprit is AI-Feynman:
https://github.com/lacava/AI-Feynman/blob/d53c055ca17d0684fd7ba7d72df8856668e2e132/setup.py#L47
With AI-Feynman commented out, and also the setuptools
requirement loosened to any version (some packages require a more recent one), I can confirm that the environment builds!
@MilesCranmer could you re-check why the test for PySR is failing? It seems to have to do with the call to pysr.install()
. If we can get it to build and test i'm happy to merge despite the problems with other methods. https://github.com/cavalab/srbench/actions/runs/5882205176/job/15953024910?pr=146
Done. I just switched to the conda-forge version of PySR, which also installs the conda-forge version of Julia.
(I was using the pip version only so that people on Windows and Apple Silicon would work, but my guess is that they'd run into other issues anyways so it's not worth it.)
Argh. Tensorflow 1.14 requires Python < 3.8. But conda-forge dropped support for Python 3.7 about a year ago, meaning any recent conda-forge package can’t be installed unless we bump tensorflow. What do you recommend?
Trying to install tensorflow==1.14 via pip instead...
Probably the easiest thing to do would be to have a separate conda environment per algorithm. I remember now that I actually did that for the PySR paper's fork of srbench: https://github.com/MilesCranmer/pysr_paper/blob/main/benchmark/build_container.md – otherwise it was just too hard to build everything in a single env
ah, that explains why I can't get the environment to resolve for DSR on the new separate environments branch.
Honestly I think it makes more sense to put energy into resolving the remaining errors on the separate-envs branch. every method has its own environment so we don't get these kinds of issues. the PySR build setup there needs to be updated to use the conda package, and there are generally a number of updates being introduced, including the requirement that algs return sympy compatible models. current tests
Sounds good. Moving to #148
This updates PySR from 0.7 to 0.15. Fixes #145.
Edit: changed to 0.16