dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.31k stars 8.73k forks source link

[Bug] [Python] Categorical CUDA fails on a data validation check if there's a float column containing only NaNs #10089

Open OneForward opened 8 months ago

OneForward commented 8 months ago

Hi, I found a Python-version bug exactly the same as this R-version bug.

Minimal code to reproduce is shown below,

import pandas as pd 
import numpy as np 
import xgboost as xgb 
print(xgb.__version__)
X = pd.DataFrame({'category_column': [0, 0, 0, -1]}, dtype='category')
X['na_column'] = np.nan 
X = X[['na_column', 'category_column']]
y = pd.DataFrame({'label': [0, 0, 0, 0]})

dtrain = xgb.DMatrix(X, y, enable_categorical=True)
booster = xgb.train({'tree_method': 'hist', 'device': 'cuda'}, dtrain)

This script with xgboost packge version 2.0.1 running on a GPU machine would probably outputs the following, Check failed: max_cat + 1 >= n_categories (1 vs. 2) : Maximum cateogry should not be lesser than the total number of categories..

2.0.1
---------------------------------------------------------------------------
XGBoostError                              Traceback (most recent call last)
Cell In[1], line 12
      9 y = pd.DataFrame({'label': [0, 0, 0, 0]})
     11 dtrain = xgb.DMatrix(X, y, enable_categorical=True)
---> 12 booster = xgb.train({'tree_method': 'hist', 'device': 'cuda'}, dtrain)

File ~/python3.11/site-packages/xgboost/core.py:729, in require_keyword_args..throw_if..inner_f(*args, **kwargs)
    727 for k, arg in zip(sig.parameters, args):
    728     kwargs[k] = arg
--> 729 return func(**kwargs)

File ~/python3.11/site-packages/xgboost/training.py:181, in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks, custom_metric)
    179 if cb_container.before_iteration(bst, i, dtrain, evals):
    180     break
--> 181 bst.update(dtrain, i, obj)
    182 if cb_container.after_iteration(bst, i, dtrain, evals):
    183     break

File ~/python3.11/site-packages/xgboost/core.py:2049, in Booster.update(self, dtrain, iteration, fobj)
   2046 self._assign_dmatrix_features(dtrain)
   2048 if fobj is None:
-> 2049     _check_call(
   2050         _LIB.XGBoosterUpdateOneIter(
   2051             self.handle, ctypes.c_int(iteration), dtrain.handle
   2052         )
   2053     )
   2054 else:
   2055     pred = self.predict(dtrain, output_margin=True, training=True)

File ~/python3.11/site-packages/xgboost/core.py:281, in _check_call(ret)
    270 """Check the return value of C API call
    271 
    272 This function will raise exception when error occurs.
   (...)
    278     return value from API calls
    279 """
    280 if ret != 0:
--> 281     raise XGBoostError(py_str(_LIB.XGBGetLastError()))

XGBoostError: [11:32:42] /workspace/src/tree/updater_gpu_hist.cu:781: Exception in gpu_hist: [11:32:42] /workspace/src/common/categorical.h:82: Check failed: max_cat + 1 >= n_categories (1 vs. 2) : Maximum cateogry should not be lesser than the total number of categories.
Stack trace:
  [bt] (0) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x7f0c9a) [0x7fe652671c9a]
  [bt] (1) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x7f41a2) [0x7fe6526751a2]
  [bt] (2) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x792c67) [0x7fe652613c67]
  [bt] (3) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x83f932) [0x7fe6526c0932]
  [bt] (4) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x83fef2) [0x7fe6526c0ef2]
  [bt] (5) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x41589e) [0x7fe65229689e]
  [bt] (6) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb08679) [0x7fe652989679]
  [bt] (7) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb085c3) [0x7fe6529895c3]
  [bt] (8) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb40297) [0x7fe6529c1297]

Stack trace:
  [bt] (0) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb27f2a) [0x7fe6529a8f2a]
  [bt] (1) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb485c9) [0x7fe6529c95c9]
  [bt] (2) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x460c79) [0x7fe6522e1c79]
  [bt] (3) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x46176c) [0x7fe6522e276c]
  [bt] (4) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x4c54f7) [0x7fe6523464f7]
  [bt] (5) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7fe651fe2ef0]
  [bt] (6) ~/python3.11/lib-dynload/../../libffi.so.8(+0xa052) [0x7fe6d35be052]
  [bt] (7) ~/python3.11/lib-dynload/../../libffi.so.8(+0x8925) [0x7fe6d35bc925]
  [bt] (8) ~/python3.11/lib-dynload/../../libffi.so.8(ffi_call+0xde) [0x7fe6d35bd06e]
trivialfis commented 8 months ago

Thank you for raising the issue, I will look into it, can reproduce it in 2.0 but not with the latest.