dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.07k stars 8.7k forks source link

Improve GPU sketching with skewed data. #7946

Open trivialfis opened 2 years ago

trivialfis commented 2 years ago

We should run https://github.com/dmlc/xgboost/blob/18cbebaeb9c04dfb112b7f42c7b15087bc6a190b/src/common/quantile.cu#L351 before pruning so that accumulated weight for the same element is more accurate.

trivialfis commented 2 years ago

This might increase the GPU memory usage by a few times. Need better ideas.