XGBoost - hist + learning_rate decay memory usage

dev7mt commented 6 years ago

Hey,

I have been trying to implement in my project eta_decay that's quite specific to my needs, but I kept running into OutOfMemory errors. After a bit of digging, I've found out, that setting learning rate, while using the "hist" tree_method seems to cause the same issue. It led me to believe that the callback itself is not the problem here.

I have tested this issue on multiple environments (two different setups of Ubuntu - on premise and cloud and macOS), and it always produced similar errors.

The code below should reproduce the issue:

import numpy as np
import xgboost as xgb
from psutil import virtual_memory as vm
import matplotlib.pyplot as plt

def get_used_memory():
    MEM = vm()
    return MEM.used / (1024 ** 3)

def generate_data():
    y = np.random.gamma(2, 4, OBS)
    X = np.random.normal(5, 2, [OBS, FEATURES])
    return X, y

def check_memory_callback(MEMORY_HISTORY):
    def callback(env):
        state = f"[{env.iteration}]/[{env.end_iteration}]"
        memory = f"Used: {get_used_memory()}"
        MEMORY_HISTORY.append(get_used_memory())

    return callback

MAX_ITER = 10
ETA_BASE = 0.3
ETA_MIN = 0.1
ETA_DECAY = np.linspace(ETA_BASE, ETA_MIN, MAX_ITER).tolist()
OBS = 10 ** 6
FEATURES = 20
PARAMS = {
    'eta': ETA_BASE,
    "tree_method": "hist",
    "booster": "gbtree",
    "silient": 0,
}
NO_DECAY_HISTORY = []
DECAY_HISTORY = []
DECAY_APPROX_HISTORY = []

X_train, y_train = generate_data()
X_test, y_test = generate_data()
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
evals_result = {}

model1 = xgb.train(
    maximize=True,
    params=PARAMS,
    dtrain=dtrain,
    num_boost_round=MAX_ITER,
    early_stopping_rounds=MAX_ITER,
    evals=[(dtest, 'test')],
    evals_result=evals_result,
    verbose_eval=True,
    callbacks=[check_memory_callback(NO_DECAY_HISTORY)]
)

model2 = xgb.train(
    maximize=True,
    params=PARAMS,
    dtrain=dtrain,
    num_boost_round=MAX_ITER,
    early_stopping_rounds=MAX_ITER,
    evals=[(dtest, 'test')],
    evals_result=evals_result,
    verbose_eval=True,
    callbacks=[check_memory_callback(DECAY_HISTORY)],
    learning_rates=ETA_DECAY
)

model3 = xgb.train(
    maximize=True,
    params={'eta': ETA_BASE, "tree_method": "approx", "booster": "gbtree", "silient": 0},
    dtrain=dtrain,
    num_boost_round=MAX_ITER,
    early_stopping_rounds=MAX_ITER,
    evals=[(dtest, 'test')],
    evals_result=evals_result,
    verbose_eval=True,
    callbacks=[check_memory_callback(DECAY_APPROX_HISTORY)],
    learning_rates=ETA_DECAY
)

plt.plot(np.linspace(1, MAX_ITER, MAX_ITER), NO_DECAY_HISTORY, label="no decay", color="green")
plt.plot(np.linspace(1, MAX_ITER, MAX_ITER), DECAY_HISTORY, label="with decay", color="red")
plt.plot(np.linspace(1, MAX_ITER, MAX_ITER), DECAY_APPROX_HISTORY, label="with approx and decay", color="blue")
plt.title("XGBoost - Memory usage over iterations")
plt.legend()
plt.ylabel("System memory GB used")
plt.xlabel("Iteration")
plt.show()

Attached plot from my run of the code above.

I did no digging into the underlying code (cpp), but a memory leakage seems plausible. As I understand this is not the desired behaviour, but maybe this method requires such amounts of memory.

hcho3 commented 6 years ago

Is this problem confined to tree_method=hist? Did you try exact or approx?

dev7mt commented 6 years ago

I tried using the approx method, it works fine then. Although the results are worse and training takes more time. As mentioned in the code above (+ blue line on the plot):

model3 = xgb.train(
    params={'eta': ETA_BASE, "tree_method": "approx", "booster": "gbtree", "silient": 0},
    [...]
    learning_rates=ETA_DECAY
)

I did not try the exact method.

hcho3 commented 6 years ago

Memory leakage may be probable. Let me look at it after 0.80 release.

Denisevi4 commented 6 years ago

I've had this issue before. I don't know exactly what is happening, but I found a workaround.

While studying it I found that the learning_rates parameter in xgb.train actually calls a reset_learning_rate Callback. Then I tried using other custom Callbacks and I saw this memory leak as well. It looks as if once you call any Callback other than the print Callback, it causes the tree to re-initialize at every iteration.

My workaround was to add a "learning_rate_schedule" dmc parameter and then set the new learning rate in the at the beginning of each iteration. It involved quite a bit of modification of the c++ code. Also, I saw this problem in gpu_hist as well. So, I edited the cuda code too. In the end my solution resets the learning rate without Callbacks. And it works.

kretes commented 6 years ago

@hcho3 0.80 is released, did you have a chance to look at this leakage?

@Denisevi4 can you share the code for that?

trivialfis commented 5 years ago

@Denisevi4 For the CUDA gpu_hist, did you find ~usual~ unusual memory usage in GPU memory, or just CPU memory? I'm currently spending time on gpu-hist, see if I can dig something out.