dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.12k stars 8.7k forks source link

Passing `updater` parameter produces warning about ignored `tree_method` even if the latter is not passed #9964

Open david-cortes opened 8 months ago

david-cortes commented 8 months ago

If one passes parameter updater, xgboost will throw a warning about tree_method being ignored, even if said parameter is not passed. The warning occurs in both python and R, and happens only the first time that xgb.train with said parameter is executed.

import numpy as np, xgboost as xgb
rng = np.random.default_rng(seed=123)
y = rng.integers(2, size=100)
X = rng.standard_normal(size=(100,10))
dm = xgb.DMatrix(X, y)
model = xgb.train(
    dtrain=dm,
    num_boost_round=10,
    params={
        "objective" : "binary:logistic",
        "max_depth" : 2,
        "eta" : 0.05,
        "updater" : "grow_colmaker,prune",
    }
)
WARNING: You have manually specified the `updater` parameter. The `tree_method` parameter will be ignored. Incorrect sequence of updaters will produce undefined behavior. For common uses, we recommend using `tree_method` parameter instead.

Subsequent executions of the same call to xgb.train do not produce the warning twice.

trivialfis commented 8 months ago

Thank you for raising the issue.

It's a really old warning that predates my involvement. From my understanding, initially, xgboost tried to support having a sequence of updaters as you shared in the snippet, but in practice, it's not quite working for most of the cases since the updater/optimizer has internal states to keep track of things like prediction cache. Then, the prune updater is not quite useful during training, since not growing the tree is the same as (if not better than) pruning a tree.

We kept the parameter for gbtree, but recommend people to use tree_method instead.

I will try to elevate the rest of the updaters into tree methods.