Open jameslamb opened 4 days ago
Thank you for sharing and volunteering!
I'd be happy to try to help with this over the next week if you'd like
Yes, please let me know if there's anything I can help. I can handle the C++ changes if needed; some checks are done inside libxgboost
, and somehow, the error message requirements from sklearn are changed.
Description
The Python package's tests fail with the latest
scikit-learn
nightlies (v1.6.dev0).All the failures appear to be from the estimator checks
scikit-learn
ships to help projects test compliance withscikit-learn
API expectations. Stuff like this:full logs (click me)
```text ============================= test session starts ============================== platform darwin -- Python 3.11.9, pytest-8.2.2, pluggy-1.5.0 rootdir: /Users/jlamb/repos/xgboost/tests configfile: pytest.ini plugins: cov-5.0.0, hypothesis-6.115.2 collected 110 items tests/python/test_with_sklearn.py ...................................... [ 34%] .....................F.......F.F.F.............F...............FF....... [100%] =================================== FAILURES =================================== _ test_estimator_reg[XGBRegressor(base_score=None,booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=None,device=None,early_stopping_rounds=None,enable_categorical=False,eval_metric=None,feature_types=None,gamma=None,grow_policy=None,importance_type=None,interaction_constraints=None,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=None,max_leaves=None,min_child_weight=None,missing=nan,monotone_constraints=None,multi_strategy=None,n_estimators=None,n_jobs=None,num_parallel_tree=None,random_state=None,...)-check_n_features_in_after_fitting] _ ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/sklearn/utils/estimator_checks.py:3974: in check_n_features_in_after_fitting callable_method(X_bad) ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/core.py:775: in inner_f return func(**kwargs) ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/sklearn.py:1225: in predict predts = self.get_booster().inplace_predict( ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/core.py:775: in inner_f return func(**kwargs) ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/core.py:2642: in inplace_predict raise ValueError( E ValueError: Feature shape mismatch, expected: 4, got 1 The above exception was the direct cause of the following exception: tests/python/test_with_sklearn.py:1349: in test_estimator_reg check(estimator) ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/sklearn/utils/_testing.py:140: in wrapper return fn(*args, **kwargs) ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/sklearn/utils/estimator_checks.py:3971: in check_n_features_in_after_fitting with raises( ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/sklearn/utils/_testing.py:1076: in __exit__ raise AssertionError(err_msg) from exc_value E AssertionError: `XGBRegressor.predict()` does not check for consistency between input number E of features with XGBRegressor.fit(), via the `n_features_in_` attribute. E You might want to use `sklearn.utils.validation.validate_data` instead E of `check_array` in `XGBRegressor.fit()` and XGBRegressor.predict()`. This can be done E like the following: E from sklearn.utils.validation import validate_data E ... E class MyEstimator(BaseEstimator): E ... E def fit(self, X, y): E X, y = validate_data(self, X, y, ...) E ... E return self E ... E def predict(self, X): E X = validate_data(self, X, ..., reset=False) E ... E return X _ test_estimator_reg[XGBRegressor(base_score=None,booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=None,device=None,early_stopping_rounds=None,enable_categorical=False,eval_metric=None,feature_types=None,gamma=None,grow_policy=None,importance_type=None,interaction_constraints=None,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=None,max_leaves=None,min_child_weight=None,missing=nan,monotone_constraints=None,multi_strategy=None,n_estimators=None,n_jobs=None,num_parallel_tree=None,random_state=None,...)-check_complex_data] _ ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/sklearn/utils/estimator_checks.py:1239: in check_complex_data estimator.fit(X, y) ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/core.py:775: in inner_f return func(**kwargs) ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/sklearn.py:1118: in fit train_dmatrix, evals = _wrap_evaluation_matrices( ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/sklearn.py:605: in _wrap_evaluation_matrices train_dmatrix = create_dmatrix( ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/sklearn.py:1040: in _create_dmatrix return QuantileDMatrix( ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/core.py:775: in inner_f return func(**kwargs) ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/core.py:1636: in __init__ self._init( ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/core.py:1695: in _init it.reraise() ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/core.py:618: in reraise raise exc # pylint: disable=raising-bad-type ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/core.py:599: in _handle_exception return fn() ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/xgboost/core.py:686: inReproducible Example
On an M2 Mac, in a Python 3.11.9 conda environment, built the Python package from source.
Installed the latest
scikit-learn
nightlies.Saw the failures reported above.
Repeated that same process but with the latest release of
scikit-learn
.All tests passed.
Notes
I found this while testing against this in-progress
scikit-learn
branch: https://github.com/scikit-learn/scikit-learn/pull/28901#discussion_r1748413039@trivialfis @hcho3 I'd be happy to try to help with this over the next week if you'd like. I'm familiar with some of the changes in
scikit-learn
from this related work we've been doing inlightgbm
: