dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.12k stars 8.7k forks source link

Depthwise and Lossguide Return Exactly Same Predictions #9123

Closed nyllmak closed 1 year ago

nyllmak commented 1 year ago

I've been messing around with different fits for my models. I tired to experiment with using lossguide for the split decisions instead of depthwise. I noticed that I'm getting the same exact predictions and fit for both of the options.

I'm using the SkLearn interface, but according to the docs it seems like whatever I am doing shouldn't be an issue. I verified that the 'grow_policy' parameter is getting saved within the JSON of the model. I presume that it points to the parameter being set and recognized (to some extent).

I've reproduced this on my end on XGB 1.6.1 and 1.7.5 using 'exact', 'approx', 'hist', and 'gpu_hist' getting the same result each time. I tried unbounding the leaves/depth but also got the same results. I would expect that unbounding depth would maybe resolve the issue by allowing lossguide to grow, but it seem like that isn't the case.

I am using a fixed seed for these test, but I don't see a reason for why that should cause an issue since the algorithms should still behave differently (correct me if I'm wrong on this).

Here is the minimum code that produced the issue for me:

import xgboost as xgb
import pandas as pd
import numpy as np

print(xgb.__version__) # 1.7.5
print(pd.__version__) # 1.1.5
print(np.__version__) # 1.20.3

# model 1
xgb_mod1 = xgb.XGBRegressor(tree_method='hist',
                           objective='reg:squarederror',
                           seed=42,
                           grow_policy='lossguide')

# model 2
xgb_mod2 = xgb.XGBRegressor(tree_method='hist',
                           objective='reg:squarederror',
                           seed=42,
                           grow_policy='depthwise')

# data
data = pd.DataFrame(np.random.random((3, 1000)).T, columns=['f1', 'f2', 'f3'])
data['target'] = (0.5 * data['f1']) + (2 * data['f2']**2) - data['f3']

xgb_mod1.fit(data[['f1', 'f2', 'f3']].iloc[0:500], data['target'].iloc[0:500])
pred1 = xgb_mod1.predict(data[['f1', 'f2', 'f3']].iloc[500:])
error1 = pred1 - data['target'].iloc[500:]

xgb_mod2.fit(data[['f1', 'f2', 'f3']].iloc[0:500], data['target'].iloc[0:500])
pred2 = xgb_mod2.predict(data[['f1', 'f2', 'f3']].iloc[500:])
error2 = pred2 - data['target'].iloc[500:]

print(error1)
print(error2)
print(sum(error1 - error2)) # I dont think that this should be 0
trivialfis commented 1 year ago

If both policies build the same tree with a small dataset, this result is possible.

nyllmak commented 1 year ago

@trivialfis The actual dataset I had his issue on has a many models each with few hundred features and 15k unique training points. Its also quite responsive in terms of how it reacts to feature and parameter changes. The maxes are set at 20 levels deep, and 900 estimators max (it uses all 900). That's why its surprising to me that the numbers come out the exactly the same for each model in the whole ensemble when changing grow_policy.

trivialfis commented 1 year ago

apologies, posted the wrong code.

trivialfis commented 1 year ago

Tried a few examples, and the predictions are the same indeed. But the explanation is just they really build the same tree regardless of the policy being used, only with the tree nodes reordered. Both policies build trees until there's no new gain can be obtained by splitting leave, in which case, the final trees are the same.

Based on this, one can make different trees if they restrict the number of leaves (which can be useful for regularization):

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
import xgboost as xgb
import numpy as np
import json

def main() -> None:
    X, y = make_regression(n_samples=2048 * 128, n_features=128, random_state=13)
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=13)

    n_estimators = 128
    reg = xgb.XGBRegressor(
        n_estimators=n_estimators,
        tree_method="hist",
        grow_policy="depthwise",
        max_leaves=12,
    )
    reg.fit(X_train, y_train)
    predt_dw = reg.predict(X_test)

    reg = xgb.XGBRegressor(
        n_estimators=n_estimators,
        tree_method="hist",
        grow_policy="lossguide",
        max_leaves=12,
    )
    reg.fit(X_train, y_train)
    predt_lg = reg.predict(X_test)

    std = np.std(predt_dw - predt_lg)
    print(std, np.allclose(predt_dw, predt_lg))
    # 32.64627 False

if __name__ == "__main__":
    main()
nyllmak commented 1 year ago

@trivialfis Thanks for the pointers. I guess I was being a bit too generous with my allowances for how complex is allowed to become.

I'll mark the issue as resolved and try to be more aggressive with my restrictions on the tree growth since it seems like use error on my end. I was expecting there to be some more divergence between the methods, but I suppose that it is no the case.

Thanks for the help.