dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.28k stars 8.73k forks source link

ArrayInterface handler for cuDF DataFrame cannot yet handle Boolean columns #10181

Open hcho3 opened 6 months ago

hcho3 commented 6 months ago

Since the ArrayInterface class in XGBoost does not yet support Boolean columns, it throws an error

/workspace/src/c_api/../data/array_interface.h:500: Boolean-1 is not supported.

whenever Boolean columns are passed in. The error is only relevant for cuDF DataFrames, since the handler for Pandas DataFrame converts Boolean columns into float type.

Context. I encountered this error while working on #10175. Starting from Pandas 2.0, pd.get_dummies() returns Boolean type, instead of uint8 (https://github.com/pandas-dev/pandas/pull/48022).

trivialfis commented 6 months ago

We need to use the bit mask for accessing boolean columns.