Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Hi, I found a Python-version bug exactly the same as this R-version bug.
Minimal code to reproduce is shown below,
import pandas as pd
import numpy as np
import xgboost as xgb
print(xgb.__version__)
X = pd.DataFrame({'category_column': [0, 0, 0, -1]}, dtype='category')
X['na_column'] = np.nan
X = X[['na_column', 'category_column']]
y = pd.DataFrame({'label': [0, 0, 0, 0]})
dtrain = xgb.DMatrix(X, y, enable_categorical=True)
booster = xgb.train({'tree_method': 'hist', 'device': 'cuda'}, dtrain)
This script with xgboost packge version 2.0.1 running on a GPU machine would probably outputs the following, Check failed: max_cat + 1 >= n_categories (1 vs. 2) : Maximum cateogry should not be lesser than the total number of categories..
2.0.1
---------------------------------------------------------------------------
XGBoostError Traceback (most recent call last)
Cell In[1], line 12
9 y = pd.DataFrame({'label': [0, 0, 0, 0]})
11 dtrain = xgb.DMatrix(X, y, enable_categorical=True)
---> 12 booster = xgb.train({'tree_method': 'hist', 'device': 'cuda'}, dtrain)
File ~/python3.11/site-packages/xgboost/core.py:729, in require_keyword_args..throw_if..inner_f(*args, **kwargs)
727 for k, arg in zip(sig.parameters, args):
728 kwargs[k] = arg
--> 729 return func(**kwargs)
File ~/python3.11/site-packages/xgboost/training.py:181, in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks, custom_metric)
179 if cb_container.before_iteration(bst, i, dtrain, evals):
180 break
--> 181 bst.update(dtrain, i, obj)
182 if cb_container.after_iteration(bst, i, dtrain, evals):
183 break
File ~/python3.11/site-packages/xgboost/core.py:2049, in Booster.update(self, dtrain, iteration, fobj)
2046 self._assign_dmatrix_features(dtrain)
2048 if fobj is None:
-> 2049 _check_call(
2050 _LIB.XGBoosterUpdateOneIter(
2051 self.handle, ctypes.c_int(iteration), dtrain.handle
2052 )
2053 )
2054 else:
2055 pred = self.predict(dtrain, output_margin=True, training=True)
File ~/python3.11/site-packages/xgboost/core.py:281, in _check_call(ret)
270 """Check the return value of C API call
271
272 This function will raise exception when error occurs.
(...)
278 return value from API calls
279 """
280 if ret != 0:
--> 281 raise XGBoostError(py_str(_LIB.XGBGetLastError()))
XGBoostError: [11:32:42] /workspace/src/tree/updater_gpu_hist.cu:781: Exception in gpu_hist: [11:32:42] /workspace/src/common/categorical.h:82: Check failed: max_cat + 1 >= n_categories (1 vs. 2) : Maximum cateogry should not be lesser than the total number of categories.
Stack trace:
[bt] (0) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x7f0c9a) [0x7fe652671c9a]
[bt] (1) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x7f41a2) [0x7fe6526751a2]
[bt] (2) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x792c67) [0x7fe652613c67]
[bt] (3) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x83f932) [0x7fe6526c0932]
[bt] (4) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x83fef2) [0x7fe6526c0ef2]
[bt] (5) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x41589e) [0x7fe65229689e]
[bt] (6) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb08679) [0x7fe652989679]
[bt] (7) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb085c3) [0x7fe6529895c3]
[bt] (8) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb40297) [0x7fe6529c1297]
Stack trace:
[bt] (0) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb27f2a) [0x7fe6529a8f2a]
[bt] (1) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb485c9) [0x7fe6529c95c9]
[bt] (2) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x460c79) [0x7fe6522e1c79]
[bt] (3) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x46176c) [0x7fe6522e276c]
[bt] (4) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x4c54f7) [0x7fe6523464f7]
[bt] (5) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7fe651fe2ef0]
[bt] (6) ~/python3.11/lib-dynload/../../libffi.so.8(+0xa052) [0x7fe6d35be052]
[bt] (7) ~/python3.11/lib-dynload/../../libffi.so.8(+0x8925) [0x7fe6d35bc925]
[bt] (8) ~/python3.11/lib-dynload/../../libffi.so.8(ffi_call+0xde) [0x7fe6d35bd06e]
Hi, I found a Python-version bug exactly the same as this R-version bug.
Minimal code to reproduce is shown below,
This script with xgboost packge version 2.0.1 running on a GPU machine would probably outputs the following,
Check failed: max_cat + 1 >= n_categories (1 vs. 2) : Maximum cateogry should not be lesser than the total number of categories.
.