dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.28k stars 8.73k forks source link

Optimization for Iterative Evaluation for XGBoost #8753

Open dthiagarajan opened 1 year ago

dthiagarajan commented 1 year ago

Hello,

I'm running an XGBoost experiment where I'm building a tree, and evaluating it after every tree I add. However, I'm noticing that prediction takes longer as I add more trees, which makes sense, given that the tree's getting larger. Is there any way I can be calling prediction on the dataset without having to recompute the predictions of trees that I've already computed predictions for? e.g. if I have n trees in the forest already, and I've already predicted with those n trees, and cached those predictions, is there a function I can write that takes in those cached predictions and the new booster object to return the predictions of the forest of n+1 trees?

Thanks in advance!

trivialfis commented 1 year ago

Hi, if you are using the DMatrix object from the native interface, the prediction is cached.