dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.14k stars 8.71k forks source link

[pyspark] Categorical data support. #8398

Open trivialfis opened 1 year ago

trivialfis commented 1 year ago

We need tests for both CPU and GPU pyspark with categorical feature inputs.

mtreca commented 11 months ago

I would be really interested in switching to XGB with categorical support in Spark. Are there any updates regarding this feature? Thanks!