Xtra-Computing / thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs
Apache License 2.0
1.56k stars 216 forks source link

Memory Leak #177

Closed AJ2019 closed 4 years ago

AJ2019 commented 4 years ago

Hi, I am training multiple SVMs and the memory used by the SVMs that were already trained are not releasing entire memory from the GPU. As a result, GPU is running out of memory. Can you please tell me how I can debug this issue?

I am using Ubuntu 16.04 and I tested with the latest thundersvm code as well.

Additional info about the SVM which did not release GPU memory (~600 MB): str2 =

1×7 cell array

{'thundersvm-train'}    {'-c'}    {'0.01'}    {'-t'}    {'0'}    {'feat.train'}    {'model'}

2019-11-13 19:20:13,971 INFO [default] loading dataset from file "feat.train" 2019-11-13 19:20:14,259 INFO [default] #instances = 66306, #features = 32 2019-11-13 19:20:14,267 INFO [default] #classes = 2 2019-11-13 19:20:14,308 INFO [default] working set size = 1024 2019-11-13 19:20:14,310 INFO [default] training start 2019-11-13 19:20:14,340 INFO [default] global iter = 0, total local iter = 513, diff = 2 2019-11-13 19:20:14,759 INFO [default] global iter = 52, total local iter = 11793, diff = 0.000890684 2019-11-13 19:20:14,759 INFO [default] training finished 2019-11-13 19:20:14,759 INFO [default] obj = -167.119 2019-11-13 19:20:14,766 INFO [default] rho = 0.99953 2019-11-13 19:20:14,767 INFO [default] #sv = 16791 2019-11-13 19:20:14,783 INFO [default] #total unique sv = 16791 2019-11-13 19:20:15,018 INFO [default] evaluating training score 2019-11-13 19:20:15,288 INFO [default] Accuracy = 0.873978

More analysis from my side

I think the code is not clearing the whole gpu memory when there are lot of vectors I have ~8K signal vectors and ~58K noise vectors

Training with ~8k signal and upto ~20K noise is good When I train the SVM with 8K signal and 30K noise vectors, full GPU memory is not cleared.

There must be a minor bug in the code, can you please look into it?

Thanks, Aditya

zeyiwen commented 4 years ago

We cannot reproduce your issue. I recommend you to build ThunderSVM with debug mode and use cuda-memcheck to see if the memory leak is from ThunderSVM.

AJ2019 commented 4 years ago

Hi, Thanks for your response. I am pretty sure that the leak is within thundersvm, as the memory leak is observed exactly at the svm train command. I am sharing the feature file and steps to reproduce. If you are not able to reproduce, I will try the debug mode.

feat.train file is attached. feat.train.tar.gz

Command that leaks GPU Memory: svm_train_matlab({'-c', num2str(0.01), '-t', num2str(0), 'feat.train', 'model'});

System used: Ubuntu 16.04 GPU: Titan X 12GB

Appreciate your help

Regards, Aditya

shijiashuai commented 4 years ago

Hi @AJ2019 , I've created a patch #180. You may checkout the pull request and check if it solves your problem.

AJ2019 commented 4 years ago

Hi @shijiashuai , I used the code with the patch and the issue is still there.

QinbinLi commented 4 years ago

Hi, @AJ2019

We have fixed a memory issue in the interface. You can update the library and try again. Thanks.

AJ2019 commented 4 years ago

Thanks @GODqinbin and @shyhuai for fixing the issue. I am not able to reproduce the issue anymore. I will do more testing in the next few days.

I appreciate your time and effort.