automl / jahs_bench_201

The first collection of surrogate benchmarks for Joint Architecture and Hyperparameter Search.
https://automl.github.io/jahs_bench_201/
MIT License
13 stars 7 forks source link

Incompatible checksums error #1

Closed webalorn closed 2 years ago

webalorn commented 2 years ago

Problem: unexpected error when running the commands of the README.

Environment:

Steps to reproduce

conda create -n jahs_err python=3.7.13
conda activate jahs_err

git clone --recurse-submodules -- git@github.com:automl/jahs_bench_mf
cd jahs_bench_mf
pip install .

python JAHS-Bench-MF/jahs_bench/public_api.py

Error

Attempting to read surrogate model from: JAHS-Bench-MF/surrogates/thesis_cifar10
Traceback (most recent call last):
  File "JAHS-Bench-MF/jahs_bench/public_api.py", line 46, in <module>
    b = Benchmark(model_path=model_path)
  File "JAHS-Bench-MF/jahs_bench/public_api.py", line 23, in __init__
    self.surrogate = XGBSurrogate.load(model_path)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.7/site-packages/jahs_bench/lib/surrogate.py", line 396, in load
    params: dict = joblib.load(outdir / cls.__params_filename)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 587, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 506, in _unpickle
    obj = unpickler.load()
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.7/pickle.py", line 1088, in load
    dispatch[key[0]](self)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.7/pickle.py", line 1436, in load_reduce
    stack[-1] = func(*args)
  File "stringsource", line 6, in ConfigSpace.hyperparameters.__pyx_unpickle_CategoricalHyperparameter
_pickle.PickleError: Incompatible checksums (58514084 vs 0xea77850 = (_choices_set, choices, choices_vector, default_value, meta, name, normalized_default_value, num_choices, probabilities, weights))

Additional informations

The same error occurs with the latest python version (3.10), or with the 3.7.5.

eddiebergman commented 2 years ago

Hi @webalorn,

Could you print the contents of pip list? ConfigSpace has recently released v0.5.0 and I assume this was made with ConfigSpace v0.4.x. You can try pip install ConfigSpace==0.4.21 and then try running. I would keep decrementing the version until it works.

You can list the avilable versions with pip install ConfigSpace==. Once you find out the one that works, it'd be great if you post it here so the requirements can be fixed. If it doesn't work, then I'm not too sure what else might work.

Best, Eddie

webalorn commented 2 years ago

Hello. ConfigSpace==0.4.21 works. Now there is another error (xgboost.core.XGBoostError: [17:44:56] /Users/runner/work/xgboost/xgboost/src/tree/tree_updater.cc:20: Unknown tree updater grow_gpu_hist), should I post it here or on another issue ?

eddiebergman commented 2 years ago

I imagine it's some similar problem with finding the right version of XGBoost unfortunatly. I'd imagine most of the errors are related to versioning. The problem with hosting pickled objects on github is that the pickled object will most likely fail if the version of the libraries it was pickled with are different than those it was unpickled with. In this case the surrogate model is that pickled object.

I would keep repeating this procedure until it works. Once you have a version that works, if you can output pip list and the modules you have to change version, I'll fix the requirements to hard requirements so that it continues to work.

@NeoChaos12 @Archit Bansal, pickled objects unfortunately require fixed requirements and everytime the code is updated (if that surrogate model interacts with your own code and expects some function to exist)

Best, Eddie

eddiebergman commented 2 years ago

Side note: You should only have to try out the packages listed here:

webalorn commented 2 years ago

I did that, but it still doesn't work. I downgraded the versions of everything I could, here is the output of pip list:

Package         Version
--------------- ---------
certifi         2020.6.20
ConfigSpace     0.4.21
Cython          0.29.28
JAHS-Bench-MF   0.0.2
joblib          1.1.0
numpy           1.21.0
pandas          1.4.1
pip             21.2.4
pyparsing       3.0.7
python-dateutil 2.8.2
pytz            2022.1
scikit-learn    1.0
scipy           1.8.0
setuptools      58.0.4
six             1.16.0
threadpoolctl   3.1.0
wheel           0.37.1
xgboost         1.5.0

Pickled objects are indeed a problem for code that could be used in other repositories. Even if we know the right versions, it can clash with other code or other pickled objets (it just happened with jahs_bench_mf and another one).

Full log

/usr/local/anaconda3/envs/jahs_err/lib/python3.10/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
Attempting to read surrogate model from: /Users/webalorn/Documents/ens/stage-m1/jahs_bench_mf/JAHS-Bench-MF/surrogates/thesis_cifar10
/usr/local/anaconda3/envs/jahs_err/lib/python3.10/site-packages/sklearn/base.py:324: UserWarning: Trying to unpickle estimator OneHotEncoder from version 1.0.1 when using version 1.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/usr/local/anaconda3/envs/jahs_err/lib/python3.10/site-packages/sklearn/base.py:324: UserWarning: Trying to unpickle estimator ColumnTransformer from version 1.0.1 when using version 1.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
  warnings.warn(
[19:02:15] WARNING: /Users/travis/build/dmlc/xgboost/src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
Traceback (most recent call last):
  File "/Users/webalorn/Documents/ens/stage-m1/jahs_bench_mf/JAHS-Bench-MF/jahs_bench/public_api.py", line 46, in <module>
    b = Benchmark(model_path=model_path)
  File "/Users/webalorn/Documents/ens/stage-m1/jahs_bench_mf/JAHS-Bench-MF/jahs_bench/public_api.py", line 23, in __init__
    self.surrogate = XGBSurrogate.load(model_path)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.10/site-packages/jahs_bench/lib/surrogate.py", line 403, in load
    model = joblib.load(outdir / cls.__model_filename)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 587, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 506, in _unpickle
    obj = unpickler.load()
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.10/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 331, in load_build
    Unpickler.load_build(self)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.10/pickle.py", line 1718, in load_build
    setstate(state)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.10/site-packages/xgboost/core.py", line 1451, in __setstate__
    _check_call(
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.10/site-packages/xgboost/core.py", line 218, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [19:02:15] /Users/travis/build/dmlc/xgboost/src/tree/tree_updater.cc:20: Unknown tree updater grow_gpu_hist
Stack trace:
  [bt] (0) 1   libxgboost.dylib                    0x000000011823f1f4 dmlc::LogMessageFatal::~LogMessageFatal() + 116
  [bt] (1) 2   libxgboost.dylib                    0x000000011839c852 xgboost::TreeUpdater::Create(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, xgboost::GenericParameter const*) + 738
  [bt] (2) 3   libxgboost.dylib                    0x00000001182d505e xgboost::gbm::GBTree::LoadConfig(xgboost::Json const&) + 2638
  [bt] (3) 4   libxgboost.dylib                    0x00000001182f45de xgboost::LearnerConfiguration::LoadConfig(xgboost::Json const&) + 814
  [bt] (4) 5   libxgboost.dylib                    0x00000001182f5f02 xgboost::LearnerIO::Load(dmlc::Stream*) + 786
  [bt] (5) 6   libxgboost.dylib                    0x0000000118239bb1 XGBoosterUnserializeFromBuffer + 145
  [bt] (6) 7   libffi.7.dylib                      0x000000010d451ead ffi_call_unix64 + 85
  [bt] (7) 8   ???                                 0x00007ffee3241e90 0x0 + 140732709215888
NeoChaos12 commented 2 years ago

Thanks for pointing out this issue! I will get back to you after running some tests. This looks like a versioning error with pickles to me as well. I am aware of the issues with pickles and am currently looking for a better way to share models that strikes a good balance between compactness, ease of use and performance. I'm open to suggestions, though!

NeoChaos12 commented 2 years ago

A short update: I successfully reproduced your original issue with ConfigSpace==0.5 and will shortly update the requirements to reflect the upper limit on that package version. I was, however, unable to reproduce your second error. I used the exact package list you had provided to successfully run the test script. As a note, some of the packages in your package list require python>=3.8, including pandas. I'm currently in the process of testing out a number of different builds and combinations in order to be really certain of at least python version compatibility. If possible, could you kindly verify that your conda base environment is clean and does not contain conflicting packages?

EDIT: I just noticed that your (2nd) error logs were generated specifically using python 3.10 and will investigate that particular version first.

EDIT 2: I can confirm that a fresh conda environment with python 3.10 and the following package list worked for me:

Package         Version
--------------- ---------
certifi         2020.6.20
ConfigSpace     0.4.21
Cython          0.29.28
JAHS-Bench-MF   0.0.2
joblib          1.1.0
numpy           1.22.3
pandas          1.4.2
pip             21.2.4
pyparsing       3.0.7
python-dateutil 2.8.2
pytz            2022.1
scikit-learn    1.0.2
scipy           1.8.0
setuptools      58.0.4
six             1.16.0
threadpoolctl   3.1.0
wheel           0.37.1
xgboost         1.5.2
NeoChaos12 commented 2 years ago

I will have to put a cap on my testing here. I've been able to follow the following procedure to successfully run the test script on python versions 3.7, 3.8, 3.9 and 3.10:

conda create -n jahs_test python=$PYTHONVER
conda activate jahs_test
cd jahs_bench_mf
pip install .
python JAHS-Bench-MF/jahs_bench/public_api.py

I will close this issue with an update to the version limits of ConfigSpace. If the issue persists on your end, do feel free to mention it and I will re-open the issue.