dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.13k stars 8.7k forks source link

XGBoost - hist + learning_rate decay memory usage #3579

Closed dev7mt closed 4 years ago

dev7mt commented 6 years ago

Hey,

I have been trying to implement in my project eta_decay that's quite specific to my needs, but I kept running into OutOfMemory errors. After a bit of digging, I've found out, that setting learning rate, while using the "hist" tree_method seems to cause the same issue. It led me to believe that the callback itself is not the problem here.

I have tested this issue on multiple environments (two different setups of Ubuntu - on premise and cloud and macOS), and it always produced similar errors.

The code below should reproduce the issue:

import numpy as np
import xgboost as xgb
from psutil import virtual_memory as vm
import matplotlib.pyplot as plt

def get_used_memory():
    MEM = vm()
    return MEM.used / (1024 ** 3)

def generate_data():
    y = np.random.gamma(2, 4, OBS)
    X = np.random.normal(5, 2, [OBS, FEATURES])
    return X, y

def check_memory_callback(MEMORY_HISTORY):
    def callback(env):
        state = f"[{env.iteration}]/[{env.end_iteration}]"
        memory = f"Used: {get_used_memory()}"
        MEMORY_HISTORY.append(get_used_memory())

    return callback

MAX_ITER = 10
ETA_BASE = 0.3
ETA_MIN = 0.1
ETA_DECAY = np.linspace(ETA_BASE, ETA_MIN, MAX_ITER).tolist()
OBS = 10 ** 6
FEATURES = 20
PARAMS = {
    'eta': ETA_BASE,
    "tree_method": "hist",
    "booster": "gbtree",
    "silient": 0,
}
NO_DECAY_HISTORY = []
DECAY_HISTORY = []
DECAY_APPROX_HISTORY = []

X_train, y_train = generate_data()
X_test, y_test = generate_data()
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
evals_result = {}

model1 = xgb.train(
    maximize=True,
    params=PARAMS,
    dtrain=dtrain,
    num_boost_round=MAX_ITER,
    early_stopping_rounds=MAX_ITER,
    evals=[(dtest, 'test')],
    evals_result=evals_result,
    verbose_eval=True,
    callbacks=[check_memory_callback(NO_DECAY_HISTORY)]
)

model2 = xgb.train(
    maximize=True,
    params=PARAMS,
    dtrain=dtrain,
    num_boost_round=MAX_ITER,
    early_stopping_rounds=MAX_ITER,
    evals=[(dtest, 'test')],
    evals_result=evals_result,
    verbose_eval=True,
    callbacks=[check_memory_callback(DECAY_HISTORY)],
    learning_rates=ETA_DECAY
)

model3 = xgb.train(
    maximize=True,
    params={'eta': ETA_BASE, "tree_method": "approx", "booster": "gbtree", "silient": 0},
    dtrain=dtrain,
    num_boost_round=MAX_ITER,
    early_stopping_rounds=MAX_ITER,
    evals=[(dtest, 'test')],
    evals_result=evals_result,
    verbose_eval=True,
    callbacks=[check_memory_callback(DECAY_APPROX_HISTORY)],
    learning_rates=ETA_DECAY
)

plt.plot(np.linspace(1, MAX_ITER, MAX_ITER), NO_DECAY_HISTORY, label="no decay", color="green")
plt.plot(np.linspace(1, MAX_ITER, MAX_ITER), DECAY_HISTORY, label="with decay", color="red")
plt.plot(np.linspace(1, MAX_ITER, MAX_ITER), DECAY_APPROX_HISTORY, label="with approx and decay", color="blue")
plt.title("XGBoost - Memory usage over iterations")
plt.legend()
plt.ylabel("System memory GB used")
plt.xlabel("Iteration")
plt.show()

Attached plot from my run of the code above. image

I did no digging into the underlying code (cpp), but a memory leakage seems plausible. As I understand this is not the desired behaviour, but maybe this method requires such amounts of memory.

hcho3 commented 6 years ago

Is this problem confined to tree_method=hist? Did you try exact or approx?

dev7mt commented 6 years ago

I tried using the approx method, it works fine then. Although the results are worse and training takes more time. As mentioned in the code above (+ blue line on the plot):

model3 = xgb.train(
    params={'eta': ETA_BASE, "tree_method": "approx", "booster": "gbtree", "silient": 0},
    [...]
    learning_rates=ETA_DECAY
)

I did not try the exact method.

hcho3 commented 6 years ago

Memory leakage may be probable. Let me look at it after 0.80 release.

Denisevi4 commented 6 years ago

I've had this issue before. I don't know exactly what is happening, but I found a workaround.

While studying it I found that the learning_rates parameter in xgb.train actually calls a reset_learning_rate Callback. Then I tried using other custom Callbacks and I saw this memory leak as well. It looks as if once you call any Callback other than the print Callback, it causes the tree to re-initialize at every iteration.

My workaround was to add a "learning_rate_schedule" dmc parameter and then set the new learning rate in the at the beginning of each iteration. It involved quite a bit of modification of the c++ code. Also, I saw this problem in gpu_hist as well. So, I edited the cuda code too. In the end my solution resets the learning rate without Callbacks. And it works.

kretes commented 6 years ago

@hcho3 0.80 is released, did you have a chance to look at this leakage?

@Denisevi4 can you share the code for that?

trivialfis commented 5 years ago

@Denisevi4 For the CUDA gpu_hist, did you find ~usual~ unusual memory usage in GPU memory, or just CPU memory? I'm currently spending time on gpu-hist, see if I can dig something out.

hcho3 commented 5 years ago

@dev7mt @Denisevi4 @kretes @trivialfis I think I found the cause of the memory leak. When the learning rate decay is enabled, FastHistMaker::Init() is called every iteration, where it should have been called only in the first iteration. The initialization function FastHistMaker::Init() allocates new objects, hence the rising memory usage over time.

I'll try to come up with a fix so that FastHistMaker::Init() is called only once.

hcho3 commented 5 years ago

Here is a snippet of diagnostic logs I injected.

Learning rate decay enabled:

xgboost/src/tree/updater_prune.cc:75: tree pruning end, 1 roots, 126 extra nodes, 0 pruned nodes, max_depth=6
[0]     test-rmse:7.7284
xgboost/src/c_api/c_api.cc:869: XGBoosterSetParam(): name = learning_rate, value = 0.2777777777777778
Tree method is selected to be 'hist', which uses a single updater grow_fast_histmaker.
xgboost/src/tree/updater_fast_hist.cc:50: FastHistMaker::Init()
xgboost/src/tree/updater_prune.cc:24: TreePruner()
xgboost/src/tree/updater_fast_hist.cc:72: FastHistMaker::Update(): is_gmat_initialized_ = false
xgboost/src/common/hist_util.cc:127: GHistIndexMatrix::Init()
xgboost/src/tree/../common/column_matrix.h:72: ColumnMatrix::Init()
xgboost/src/tree/updater_prune.cc:75: tree pruning end, 1 roots, 126 extra nodes, 0 pruned nodes, max_depth=6
[1]     test-rmse:6.82093
xgboost/src/c_api/c_api.cc:869: XGBoosterSetParam(): name = learning_rate, value = 0.25555555555555554
Tree method is selected to be 'hist', which uses a single updater grow_fast_histmaker.
xgboost/src/tree/updater_fast_hist.cc:50: FastHistMaker::Init()
xgboost/src/tree/updater_fast_hist.cc:72: FastHistMaker::Update(): is_gmat_initialized_ = false
xgboost/src/common/hist_util.cc:127: GHistIndexMatrix::Init()
xgboost/src/tree/../common/column_matrix.h:72: ColumnMatrix::Init()
xgboost/src/tree/updater_prune.cc:75: tree pruning end, 1 roots, 126 extra nodes, 0 pruned nodes, max_depth=6

Learning rate decay disabled

xgboost/src/tree/updater_prune.cc:75: tree pruning end, 1 roots, 126 extra nodes, 0 pruned nodes, max_depth=6
[0]     test-rmse:7.72278
xgboost/src/tree/updater_fast_hist.cc:72: FastHistMaker::Update(): is_gmat_initialized_ = true
xgboost/src/tree/updater_prune.cc:75: tree pruning end, 1 roots, 126 extra nodes, 0 pruned nodes, max_depth=6
[1]     test-rmse:6.75087
hcho3 commented 5 years ago

Diagnosis The learning rate decay callback function calls XGBoosterSetParam() to update the learning rate. The XGBoosterSetParam() function in turn calls Learner::Configure(), which re-initializes each tree updater (calling FastHistMaker::Init()). The FastHistMaker updater maintains extra objects that are meant to be recycled across iterations, and re-initialization wastes memory by duplicating those internal objects.

hcho3 commented 5 years ago

@dev7mt @Denisevi4 @kretes @trivialfis Fix is available at #3803.

hcho3 commented 5 years ago

@dev7mt @Denisevi4 @kretes The next upcoming release (version 0.81) will not include a fix for the memory leak issue. The reason is that the fix is only temporary, adds a lot of maintenance burden, and it will be supplanted by a future code re-factor. For now, you should use approx and exact when using learning rate decay. Alternatively, checkout the eta_decay_memleak branch from my fork.

trivialfis commented 5 years ago

@hcho3

The FastHistMaker updater maintains extra objects that are meant to be recycled across iterations, and re-initialization wastes memory by duplicating those internal objects.

Could you be more specific about which object? I'm trying to do parameter update, may just fix this on the way...