dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.28k stars 8.73k forks source link

how to get the time of one boost round? #7443

Closed showkeyjar closed 2 years ago

showkeyjar commented 2 years ago

I use gridsearch and optuna to search the best hyper parameters of xgboost.

but I found when the tree is deep and big,

then the boost round will very slow, and cost too much time

and the big tree parameters usually not use for productive system.

so the big tree search usually waste time.

I want to find a way that can measure the boost round time,

if the time is too long, then stop train, and this strategy can also avoid OOM.

please help how can I do this? thanks!

trivialfis commented 2 years ago

Hi, you can define a callback function. Please see Python doc and demo. https://xgboost.readthedocs.io/en/stable/python/callbacks.html

showkeyjar commented 2 years ago

I read the doc and demo,

I can use callback measure time, but

there is no information for how to stop train in callback functions

trivialfis commented 2 years ago

From the demo, there's a after_iteration function defined in the callback class, which returns a boolean value. According to the document of its parent class TrainingCallback https://github.com/dmlc/xgboost/blob/b0015fda9658d44c1e7ac3eb0361ba8d1f44bcdc/python-package/xgboost/callback.py#L44 , it should return True when the training should stop. Also, in the demo https://github.com/dmlc/xgboost/blob/b0015fda9658d44c1e7ac3eb0361ba8d1f44bcdc/demo/guide-python/callbacks.py#L50 it should return False when the training should continue.

It might not be intuitive enough, could you please open a PR for suggested changes to the document so that it can be more friendly to new users?