Closed PatWalters closed 2 years ago
Thanks for catching this. I just posted a fix for this here: be0ae4455e34e1e600ec424596baa86a6a19a07b
It was indeed that the XGBoost models ignore the cuda availability check and use GPU by default. We'll push these to PyPI soon, but for a fix in the meantime - you can run:
pip install --upgrade "olorenchemengine[full] @ git+https://github.com/Oloren-AI/olorenchemengine.git"
to upgrade to the latest version from master.
We are implementing a better CI system that should test OCE on many different environments/platforms very soon - which should help catch these issues earlier going forward.
Thanks, that worked. Note that I did have to pip uninstall xgboost pip install xgboost to get xgboost to work.
Thanks, will add that note to the installation guide
This looks like another case where the code is looking for a GPU
Lots of output like this
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=7, min_child_weight=3, n_estimators=200, reg_alpha=20, reg_lambda=0, subsample=0.7; total time= 0.0s
ValueError Traceback (most recent call last) Cell In [3], line 5 1 # This will now use our model manager to test our top models 2 # and will take around 1-4 hours to run in total for this dataset, 3 # though it will take more or less time depending on the machine. ----> 5 mm.run(models)
File ~/opt/anaconda3/envs/oce/lib/python3.8/site-packages/olorenchemengine/manager.py:104, in BaseModelManager.run(self, models, return_models) 102 print("Fitting model") 103 start = timer() --> 104 model.fit(*self.dataset.train_dataset) 105 end = timer() 106 print("Evaluating model")
File ~/opt/anaconda3/envs/oce/lib/python3.8/site-packages/olorenchemengine/base_class.py:466, in BaseModel.fit(self, X_train, y_train, valid, error_model) 463 if not valid is None: 464 X_valid = self.preprocess(X_valid, y_valid, fit=False) --> 466 self._fit(X_train, y_train) 468 # Calibrate model 469 if not valid is None and self.setting == "classification":
File ~/opt/anaconda3/envs/oce/lib/python3.8/site-packages/olorenchemengine/base_class.py:995, in BaseSKLearnModel._fit(self, X_train, y_train) 993 else: 994 self.setting = "regression" --> 995 self.regression_model.fit(X_train, y_train) 996 self.model = self.regression_model
File ~/opt/anaconda3/envs/oce/lib/python3.8/site-packages/olorenchemengine/basics.py:49, in RandomizedSearchCVModel.fit(self, *args, kwargs) 48 def fit(self, *args, *kwargs): ---> 49 super().fit(args, kwargs) 50 self.obj = self.bestestimator
File ~/opt/anaconda3/envs/oce/lib/python3.8/site-packages/sklearn/model_selection/_search.py:875, in BaseSearchCV.fit(self, X, y, groups, **fit_params) 869 results = self._format_results( 870 all_candidate_params, n_splits, all_out, all_more_results 871 ) 873 return results --> 875 self._run_search(evaluate_candidates) 877 # multimetric is determined here because in the case of a callable 878 # self.scoring the return type is only known after calling 879 first_test_score = all_out[0]["test_scores"]
File ~/opt/anaconda3/envs/oce/lib/python3.8/site-packages/sklearn/model_selection/_search.py:1753, in RandomizedSearchCV._run_search(self, evaluate_candidates) 1751 def _run_search(self, evaluate_candidates): 1752 """Search n_iter candidates from param_distributions""" -> 1753 evaluate_candidates( 1754 ParameterSampler( 1755 self.param_distributions, self.n_iter, random_state=self.random_state 1756 ) 1757 )
File ~/opt/anaconda3/envs/oce/lib/python3.8/site-packages/sklearn/model_selection/_search.py:852, in BaseSearchCV.fit..evaluate_candidates(candidate_params, cv, more_results)
845 elif len(out) != n_candidates * n_splits:
846 raise ValueError(
847 "cv.split and cv.get_n_splits returned "
848 "inconsistent results. Expected {} "
849 "splits, got {}".format(n_splits, len(out) // n_candidates)
850 )
--> 852 _warn_or_raise_about_fit_failures(out, self.error_score)
854 # For callable self.scoring, the return type is only know after
855 # calling. If the return type is a dictionary, the error scores
856 # can now be inserted with the correct key. The type checking
857 # of out will be done in
_insert_error_scores
. 858 if callable(self.scoring):File ~/opt/anaconda3/envs/oce/lib/python3.8/site-packages/sklearn/model_selection/_validation.py:367, in _warn_or_raise_about_fit_failures(results, error_score) 360 if num_failed_fits == num_fits: 361 all_fits_failed_message = ( 362 f"\nAll the {num_fits} fits failed.\n" 363 "It is very likely that your model is misconfigured.\n" 364 "You can try to debug the error by setting error_score='raise'.\n\n" 365 f"Below are more details about the failures:\n{fit_errors_summary}" 366 ) --> 367 raise ValueError(all_fits_failed_message) 369 else: 370 some_fits_failed_message = ( 371 f"\n{num_failed_fits} fits failed out of a total of {num_fits}.\n" 372 "The score on these train-test partitions for these parameters" (...) 376 f"Below are more details about the failures:\n{fit_errors_summary}" 377 )
ValueError: All the 500 fits failed. It is very likely that your model is misconfigured. You can try to debug the error by setting error_score='raise'.
Below are more details about the failures:
72 fits failed with the following error: Traceback (most recent call last): File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score estimator.fit(X_train, y_train, fit_params) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 575, in inner_f return f(kwargs) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/sklearn.py", line 961, in fit self._Booster = train( File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 575, in inner_f return f(**kwargs) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/training.py", line 181, in train bst.update(dtrain, i, obj) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 1778, in update _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 246, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError())) xgboost.core.XGBoostError: [12:55:15] /Users/runner/work/xgboost/xgboost/python-package/build/temp.macosx-10.9-x86_64-cpython-37/xgboost/src/gbm/../common/common.h:239: XGBoost version not compiled with GPU support. Stack trace: [bt] (0) 1 libxgboost.dylib 0x0000000153136705 dmlc::LogMessageFatal::~LogMessageFatal() + 117 [bt] (1) 2 libxgboost.dylib 0x00000001531eb550 xgboost::gbm::GBTree::ConfigureUpdaters() + 512 [bt] (2) 3 libxgboost.dylib 0x00000001531eb0e7 xgboost::gbm::GBTree::Configure(std::1::vector<std::1::pair<std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator >, std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator > >, std::1::allocator<std::1::pair<std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator >, std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator > > > > const&) + 1143
[bt] (3) 4 libxgboost.dylib 0x000000015320a1e9 xgboost::LearnerConfiguration::Configure() + 1177
[bt] (4) 5 libxgboost.dylib 0x000000015320a4d9 xgboost::LearnerImpl::UpdateOneIter(int, std::__1::shared_ptr) + 105
[bt] (5) 6 libxgboost.dylib 0x000000015313a54a XGBoosterUpdateOneIter + 138
[bt] (6) 7 libffi.7.dylib 0x00000001093e4ead ffi_call_unix64 + 85
[bt] (7) 8 ??? 0x0000000301b70220 0x0 + 12913672736
236 fits failed with the following error: Traceback (most recent call last): File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score estimator.fit(X_train, y_train, fit_params) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 575, in inner_f return f(kwargs) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/sklearn.py", line 961, in fit self._Booster = train( File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 575, in inner_f return f(**kwargs) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/training.py", line 181, in train bst.update(dtrain, i, obj) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 1778, in update _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 246, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError())) xgboost.core.XGBoostError: [12:55:16] /Users/runner/work/xgboost/xgboost/python-package/build/temp.macosx-10.9-x86_64-cpython-37/xgboost/src/gbm/../common/common.h:239: XGBoost version not compiled with GPU support. Stack trace: [bt] (0) 1 libxgboost.dylib 0x0000000153136705 dmlc::LogMessageFatal::~LogMessageFatal() + 117 [bt] (1) 2 libxgboost.dylib 0x00000001531eb550 xgboost::gbm::GBTree::ConfigureUpdaters() + 512 [bt] (2) 3 libxgboost.dylib 0x00000001531eb0e7 xgboost::gbm::GBTree::Configure(std::1::vector<std::1::pair<std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator >, std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator > >, std::1::allocator<std::1::pair<std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator >, std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator > > > > const&) + 1143
[bt] (3) 4 libxgboost.dylib 0x000000015320a1e9 xgboost::LearnerConfiguration::Configure() + 1177
[bt] (4) 5 libxgboost.dylib 0x000000015320a4d9 xgboost::LearnerImpl::UpdateOneIter(int, std::__1::shared_ptr) + 105
[bt] (5) 6 libxgboost.dylib 0x000000015313a54a XGBoosterUpdateOneIter + 138
[bt] (6) 7 libffi.7.dylib 0x00000001093e4ead ffi_call_unix64 + 85
[bt] (7) 8 ??? 0x0000000301b70220 0x0 + 12913672736
192 fits failed with the following error: Traceback (most recent call last): File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score estimator.fit(X_train, y_train, fit_params) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 575, in inner_f return f(kwargs) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/sklearn.py", line 961, in fit self._Booster = train( File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 575, in inner_f return f(**kwargs) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/training.py", line 181, in train bst.update(dtrain, i, obj) File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 1778, in update _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, File "/Users/pwalters/opt/anaconda3/envs/oce/lib/python3.8/site-packages/xgboost/core.py", line 246, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError())) xgboost.core.XGBoostError: [12:55:17] /Users/runner/work/xgboost/xgboost/python-package/build/temp.macosx-10.9-x86_64-cpython-37/xgboost/src/gbm/../common/common.h:239: XGBoost version not compiled with GPU support. Stack trace: [bt] (0) 1 libxgboost.dylib 0x0000000153136705 dmlc::LogMessageFatal::~LogMessageFatal() + 117 [bt] (1) 2 libxgboost.dylib 0x00000001531eb550 xgboost::gbm::GBTree::ConfigureUpdaters() + 512 [bt] (2) 3 libxgboost.dylib 0x00000001531eb0e7 xgboost::gbm::GBTree::Configure(std::1::vector<std::1::pair<std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator >, std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator > >, std::1::allocator<std::1::pair<std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator >, std::1::basic_string<char, std::__1::char_traits, std:: 1::allocator > > > > const&) + 1143
[bt] (3) 4 libxgboost.dylib 0x000000015320a1e9 xgboost::LearnerConfiguration::Configure() + 1177
[bt] (4) 5 libxgboost.dylib 0x000000015320a4d9 xgboost::LearnerImpl::UpdateOneIter(int, std::__1::shared_ptr) + 105
[bt] (5) 6 libxgboost.dylib 0x000000015313a54a XGBoosterUpdateOneIter + 138
[bt] (6) 7 libffi.7.dylib 0x00000001093e4ead ffi_call_unix64 + 85
[bt] (7) 8 ??? 0x0000000301b70220 0x0 + 12913672736
Get the list of models and sort by their RMSE performance metrics
We see that the best model now outperforms the published models.