Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Please see the changes in the tutorial about the new features. In summary, we can now have two different ways to handle query groups in the Dask interface, depending on whether a global sort is desired.
Please note that, this PR ensures the accuracy of the model at a heavy performance price. A global sort is costly and pushing huge amount of small partitions into the QuantileDMatrix is also inefficient.
Please see the changes in the tutorial about the new features. In summary, we can now have two different ways to handle query groups in the Dask interface, depending on whether a global sort is desired.
Please note that, this PR ensures the accuracy of the model at a heavy performance price. A global sort is costly and pushing huge amount of small partitions into the
QuantileDMatrix
is also inefficient.related: