Reproducibility: different result on first run with gpu_hist on single GPU

cstefansen commented 1 year ago

Based on https://github.com/dmlc/xgboost/issues/5023 it seems like XGBoost aims to guarantee reproducibility for single GPU training with gpu_hist. That is, training again on the same hardware with the same data and the same seed should give precisely the same model bit for bit.

However, I am consistently seeing different results on the very first training run (on a freshly started Python interpreter - this is important) for the following code:

import xgboost

n_rows = 25_000
n_features = 1_000
n_rows_val = 12_500

np.random.seed(0)

x = np.random.normal(-3.0e-05, 0.5, (n_rows, n_features))
y = np.random.normal(-0.05, 1.0, n_rows)
w = np.clip(np.random.normal(7.25e+06, 1.25e+07, n_rows), 0.0, None)

x_val = np.random.normal(-3.0e-05, 0.5, (n_rows_val, n_features))
y_val = np.random.normal(-0.05, 1.0, n_rows_val)
w_val = np.clip(np.random.normal(7.25e+06, 1.25e+07, n_rows_val), 0.0, None)

print(f'XGBoost version {xgboost.__version__}')

models = []

for i in range(3):
    xgbr = xgboost.XGBRegressor(
        gpu_id=0,
        tree_method='gpu_hist',
        sampling_method='gradient_based',
        verbosity=0,
        booster='gbtree',
        n_jobs=1,
        nthreads=1,
        random_state=np.random.RandomState(0),
        seed=0,
        single_precision_histogram=False,
        max_delta_step = 0,
        colsample_bylevel = 1.0,
        scale_pos_weight = 1.0,
        base_score = 0.0,
        colsample_bynode=0.5,
        colsample_bytree=0.13,
        gamma=7_500, 
        objective='reg:squarederror',
        learning_rate=0.007,
        max_depth=6,
        min_child_weight=30_000,
        n_estimators=2_500,
        reg_alpha=8.0,
        reg_lambda=0.5,
        subsample=0.45,
    )

    xgbr.fit(x, y, sample_weight=w)
    score = xgbr.score(x_val, y_val, sample_weight=w_val)
    models.append(xgbr)
    print(i, score)

which results in

0 -0.009656077054927659
1 -0.010103391088486235
2 -0.010103391088486235

This is on Linux with a Tesla T4 and CUDA 11.7.

Is there another seed that needs to be set to ensure that the first run works off of the same seed as the subsequent runs? Or is this potentially a bug?

trivialfis commented 1 year ago

I think it's caused by the global random engine used inside xgboost. The booster trained in the second iteration is affected by the one from the first iteration as they share the same random engine.

trivialfis commented 1 year ago

Similar to calling x = np.random.normal(-3.0e-05, 0.5, (n_rows, n_features)) twice, x is different even if the np seed is specified.

cstefansen commented 1 year ago

@trivialfis, to stay with your analogy, it is possible to get reproducible results by saying:

np.random.seed(0)
x1 = np.random.normal(-3.0e-05, 0.5, (n_rows, n_features))

np.random.seed(0)
x2 = np.random.normal(-3.0e-05, 0.5, (n_rows, n_features))

np.testing.assert_equal(x1, x2)

Is there a way to achieve the same reproducibility for XGBoost?

The example in the original repro does in fact produce reproducible (i.e., identical) results in each iteration when run on a CPU. However, when run on a single GPU, the first run is always different from the subsequent runs (I ran this with 1000 iterations, and a got 999 identical models after the first one.)

This seems to be like a seed/rng initialization within the GPU code because once the Python interpreter has run the repro once, it will produce identical results when run again, and to repro I have to restart the Python interpreter to get it to produce a different model on the first iteration.

mingli-ts commented 1 year ago

@trivialfis just want to bump this up. Do you know if there is a way to set the global random engine seed you mentioned? As the example above shows, setting np.random.seed does not fix the issue. The weird part is only the first run is non-deterministic. The following runs all give same results.

trivialfis commented 1 year ago

Let me take another look later

dmlc / xgboost

Reproducibility: different result on first run with gpu_hist on single GPU #8820