Xtra-Computing / thundergbm

ThunderGBM: Fast GBDTs and Random Forests on GPUs
Apache License 2.0
695 stars 88 forks source link

Out of memory when depth or n_trees are too large #41

Closed TAOinfoWr closed 4 years ago

TAOinfoWr commented 4 years ago

I have same problem about out of memory.According to my observation, it seems related to model complexity(ex: depth or n_trees too large ? ). is there any method to solve it?

The follow is my code and os/gpu information and error message.

from thundergbm import TGBMClassifier from sklearn import datasets import numpy as np dim=9 row_num=30000 X=np.random.random((row_num,dim)) X = X.astype('float32') y=np.random.randint(0,7,row_num) y = y.astype('int32') clf = TGBMClassifier(depth=18, n_trees=1000,verbose=0,bagging=0) clf.fit(X, y)

------------------------------------------------------------------------

Ubuntu 16.04 NVIDIA driver : 440.44 CUDA Version: 10.2 GPU - GeForce RTX 2080

------------------------------------------------------------------------

Error message 1. 2020-04-09 18:12:35,082 FATAL [default] Check failed: [error == cudaSuccess] out of memory 2020-04-09 18:12:35,082 WARNING [default] Aborting application. Reason: Fatal log at [/home/admin/Desktop/EverComm_ibpem_gpu/thundergbm/src/thundergbm/syncmem.cpp:107] Aborted (core dumped)

  1. another message (depth=20)

2020-04-09 18:41:17,156 FATAL [default] Check failed: [size() == source.size()] destination and source count doesn't match 2020-04-09 18:41:17,156 WARNING [default] Aborting application. Reason: Fatal log at [/home/admin/Desktop/EverComm_ibpem_gpu/thundergbm/include/thundergbm/syncarray.h:91] Aborted (core dumped)

TAOinfoWr commented 4 years ago

it's my gpu detail information

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:65:00.0 Off | N/A | | 24% 34C P0 23W / 257W | 0MiB / 11018MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

Kurt-Liuhf commented 4 years ago

Hi @TAOinfoWr, the error of out-of-memory is due to the large parameters (i.e, the tree depth and the number of trees) you set. When you use some parameters like these, the training process of GBDT is really memory consuming, especially for a classification task. We recommend you to try training ThunderGBM with a smaller tree depth and a smaller number of trees. Thank you.

TAOinfoWr commented 4 years ago

Hi @Kurt-Liuhf ,thank you for your response. I know what you mean, but in same case, the larger value of the hyperpararmeter(ie, tree depth) has a better predictive performance. if it could improve gpu memory optimization in future,i think it woueld be very great. Thank you!.