Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
The Caveats in docs for running RandomForest in XGBoost are saying:
XGBoost uses 2nd order approximation to the objective function. This can lead to results that differ from a random forest implementation that uses the exact value of the objective function.
second order approximation is not the main difference from sklearn's result. The main difference seem to be the objective function: "gini" for sklearn and "logloss" (?) for XGBoost (please correct me if I am wrong)
And the choice of objective function, not the order of approximation, will affect probability calibration curves:
with calibration curves for XGBoost with booster="gbtree" being perfectly (as expected) calibrated for bigger datasets.
So my proposal here is to add this to the docs (assuming I am right)
That's correct, the description should be reworded to point out that gini criterion of random forest is different from the logloss objective used in XGBoost.
The Caveats in docs for running RandomForest in XGBoost are saying:
It seems to me when you do eg:
second order approximation is not the main difference from sklearn's result. The main difference seem to be the objective function:
"gini"
forsklearn
and"logloss"
(?) forXGBoost
(please correct me if I am wrong)And the choice of objective function, not the order of approximation, will affect probability calibration curves:
with calibration curves for XGBoost with
booster="gbtree"
being perfectly (as expected) calibrated for bigger datasets.So my proposal here is to add this to the docs (assuming I am right)