[feature ] return the best model after early stop

dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

https://xgboost.readthedocs.io/en/stable/

Apache License 2.0

26.17k stars 8.71k forks source link

[feature ] return the best model after early stop #10816

Open sonetto19999 opened 4 weeks ago

sonetto19999 commented 4 weeks ago

Does xgboost4j-spark support to get the best model after early stop?

It seems it will get the model at that iteration which is best iteration + num_early_stopping_rounds, am i wrong? how could i get the best model?

wbo4958 commented 4 weeks ago

I guess when predicting, you can specify how many trees to be used. But currently, xgboost4j has not supported specifying the tree limit. I will make a PR for it.

trivialfis commented 3 weeks ago

Jvm doesn't support model slicing yet. @wbo4958 this might help https://github.com/dmlc/xgboost/blob/67c8c967845c05eb52e13bdee478db4cc37a0c09/demo/guide-python/individual_trees.py#L61 .

wbo4958 commented 3 weeks ago

Cool, looks like it's doable.