dask / dask-xgboost

BSD 3-Clause "New" or "Revised" License
162 stars 43 forks source link

XGBoostError: Boolean is not supported #57

Open vilmara opened 4 years ago

vilmara commented 4 years ago

Hi all, I am the running the rapids NYCTaxi notebook via docker image rapidsai/rapidsai:0.10-cuda10.1-runtime-ubuntu18.04, but I am getting the below error at the training step, some tip to fix it?:

import dask_xgboost as dxgb_gpu

params = {
 'learning_rate': 0.3,
  'max_depth': 8,
  'objective': 'reg:squarederror',
  'subsample': 0.6,
  'gamma': 1,
  'silent': True,
  'verbose_eval': True,
  'tree_method':'gpu_hist',
  'n_gpus': 1
}

trained_model = dxgb_gpu.train(client, params, X_train, Y_train, num_boost_round=100)

Tracelog:

XGBoostError                              Traceback (most recent call last)
<timed exec> in <module>

/opt/conda/envs/rapids/lib/python3.6/site-packages/dask_xgboost/core.py in train(client, params, data, labels, dmatrix_kwargs, **kwargs)
    233     """
    234     return client.sync(_train, client, params, data,
--> 235                        labels, dmatrix_kwargs, **kwargs)
    236 
    237 

/opt/conda/envs/rapids/lib/python3.6/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    760         else:
    761             return sync(
--> 762                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    763             )
    764 

/opt/conda/envs/rapids/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    331     if error[0]:
    332         typ, exc, tb = error[0]
--> 333         raise exc.with_traceback(tb)
    334     else:
    335         return result[0]

/opt/conda/envs/rapids/lib/python3.6/site-packages/distributed/utils.py in f()
    315             if callback_timeout is not None:
    316                 future = gen.with_timeout(timedelta(seconds=callback_timeout), future)
--> 317             result[0] = yield future
    318         except Exception as exc:
    319             error[0] = sys.exc_info()

/opt/conda/envs/rapids/lib/python3.6/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

/opt/conda/envs/rapids/lib/python3.6/site-packages/tornado/gen.py in run(self)
    740                     if exc_info is not None:
    741                         try:
--> 742                             yielded = self.gen.throw(*exc_info)  # type: ignore
    743                         finally:
    744                             # Break up a reference to itself

/opt/conda/envs/rapids/lib/python3.6/site-packages/dask_xgboost/core.py in _train(client, params, data, labels, dmatrix_kwargs, **kwargs)
    193 
    194     # Get the results, only one will be non-None
--> 195     results = yield client._gather(futures)
    196     result = [v for v in results if v]
    197     if not params.get('dask_all_models', False):

/opt/conda/envs/rapids/lib/python3.6/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

/opt/conda/envs/rapids/lib/python3.6/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1699                             exc = CancelledError(key)
   1700                         else:
-> 1701                             raise exception.with_traceback(traceback)
   1702                         raise exc
   1703                     if errors == "skip":

/opt/conda/envs/rapids/lib/python3.6/site-packages/dask_xgboost/core.py in train_part()
     97         if dmatrix_kwargs is None:
     98             dmatrix_kwargs = {}
---> 99         dtrain = xgb.DMatrix(data, labels, **dmatrix_kwargs)
    100 
    101     elif labels[0] is None and isinstance(data[0], xgb.DMatrix):

/opt/conda/envs/rapids/lib/python3.6/site-packages/xgboost/core.py in __init__()
    512             self._init_from_dt(data, nthread)
    513         elif _use_columnar_initializer(data):
--> 514             self._init_from_columnar(data, missing)
    515         else:
    516             try:

/opt/conda/envs/rapids/lib/python3.6/site-packages/xgboost/core.py in _init_from_columnar()
    651             _LIB.XGDMatrixCreateFromArrayInterfaces(
    652                 interfaces, ctypes.c_int32(has_missing),
--> 653                 ctypes.c_float(missing), ctypes.byref(handle)))
    654         self.handle = handle
    655 

/opt/conda/envs/rapids/lib/python3.6/site-packages/xgboost/core.py in _check_call()
    199     """
    200     if ret != 0:
--> 201         raise XGBoostError(py_str(_LIB.XGBGetLastError()))
    202 
    203 

XGBoostError: [16:36:13] /conda/conda-bld/xgboost_1571337679414/work/src/data/simple_csr_source.cu:161: Boolean is not supported.
Stack trace:
  [bt] (0) /opt/conda/envs/rapids/lib/libxgboost.so(+0xc9594) [0x7f80d2a83594]
  [bt] (1) /opt/conda/envs/rapids/lib/libxgboost.so(xgboost::data::SimpleCSRSource::FromDeviceColumnar(std::vector<xgboost::Json, std::allocator<xgboost::Json> > const&, bool, float)+0x743) [0x7f80d2c66443]
  [bt] (2) /opt/conda/envs/rapids/lib/libxgboost.so(xgboost::data::SimpleCSRSource::CopyFrom(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, float)+0xc74) [0x7f80d2ade9e4]
  [bt] (3) /opt/conda/envs/rapids/lib/libxgboost.so(XGDMatrixCreateFromArrayInterfaces+0x1c8) [0x7f80d2a91b08]
  [bt] (4) /opt/conda/envs/rapids/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f82df0f3630]
  [bt] (5) /opt/conda/envs/rapids/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f82df0f2fed]
  [bt] (6) /opt/conda/envs/rapids/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f82df10a00e]
  [bt] (7) /opt/conda/envs/rapids/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x13a45) [0x7f82df10aa45]
  [bt] (8) /opt/conda/envs/rapids/bin/python(_PyObject_FastCallDict+0x8b) [0x5603fddf67bb]`
TomAugspurger commented 4 years ago

Does this work using just XGBoost on a subset of the data? That error seems to indicate that something in xgboost can't handle a bool column?

vilmara commented 4 years ago

Hi @TomAugspurger , thank you are right, I have dropped the boolean column (which wasn't needed) and it worked. So, what is the issue with dask-xgboost handling boolean columns, where should I report this bug?

TomAugspurger commented 4 years ago

Does it work if you're just using xgboost itself?

On Wed, Oct 23, 2019 at 2:21 PM Vilmara notifications@github.com wrote:

Hi Tom, thank you are right, I have dropped the boolean column (which wasn't needed) and it worked. So, what is the issue with dask-xgboost handling boolean columns, where should I report this bug?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/57?email_source=notifications&email_token=AAKAOIUFWSORLHLCOTIUEMDQQCP4DA5CNFSM4JEETQP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECCSNEY#issuecomment-545597075, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITMVMD2DURIW2GFK5TQQCP4DANCNFSM4JEETQPQ .

vilmara commented 4 years ago

I am working on multi-node mode, I haven't tried with xgboost itself yet

TomAugspurger commented 4 years ago

I'd recommend trying to pass a small subset of your data to a regular xgboost train call to see if it supports boolean columns.

On Wed, Oct 23, 2019 at 2:26 PM Vilmara notifications@github.com wrote:

I am working on multi-node mode, I haven't tried with xgboost itself yet

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-xgboost/issues/57?email_source=notifications&email_token=AAKAOIX274AFCYNHNLO6YOTQQCQPRA5CNFSM4JEETQP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECCS4KQ#issuecomment-545599018, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIXAR5WJDKLNDJFQGK3QQCQPRANCNFSM4JEETQPQ .