Xtra-Computing / thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs
Apache License 2.0
1.56k stars 217 forks source link

Handling large datasets #121

Closed naveenmiriyalu closed 5 years ago

naveenmiriyalu commented 5 years ago

Hi , I am doing a system performance study of GPU accelerated SVMs. I am using a dataset of 48GB for classification (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.bz2). The actual dataset size is 11GB . I have replicated the same data around 4 times to 48GB.

I have tried with -m 8192 and -m 16384. Both give out of memory error. I have a GPU with 16GB of memory.

So can ThunderSVM handle datasets that dont fit into GPU memory ? Or do I need to supply the command line with any additional parameters ?

Here is the log of the failure : ./thundersvm-train -c 3 ./../../data/big_epsilon_normalized_48G INFO [default] loading dataset from file "./../../data/big_epsilon_normalized_48G" INFO [default] #instances = 1600000, #features = 2000 INFO [default] training C-SVC INFO [default] C = 3 WARNING [default] using default gamma=0.0005 INFO [default] #classes = 2 FATAL [default] out of memory, you may try "-m memory size" to constrain memory usage WARNING [default] Aborting application. Reason: Fatal log at [/home/tf/ThunderSVM/thundersvm/src/thundersvm/thundersvm-train.cpp:116]

shijiashuai commented 5 years ago

Hi. Currently ThunderSVM does not support dataset larger than GPU memory. The whole dataset will be copied to GPU memory at the beginning of training. Otherwise there will be a lot of PCIE data transfer during training.

Naveen notifications@github.com于2019年2月6日 周三上午8:25写道:

Hi , I am doing a system performance study of GPU accelerated SVMs. I am using a dataset of 48GB for classification ( https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.bz2). The actual dataset size is 11GB . I have replicated the same data around 4 times to 48GB.

I have tried with -m 8192 and -m 16384. Both give out of memory error. I have a GPU with 16GB of memory.

So can ThunderSVM handle datasets that dont fit into GPU memory ? Or do I need to supply the command line with any additional parameters ?

Here is the log of the failure : ./thundersvm-train -c 3 ./../../data/big_epsilon_normalized_48G INFO [default] loading dataset from file "./../../data/big_epsilon_normalized_48G" INFO [default] #instances = 1600000, #features = 2000 INFO [default] training C-SVC INFO [default] C = 3 WARNING [default] using default gamma=0.0005 INFO [default] #classes = 2 FATAL [default] out of memory, you may try "-m memory size" to constrain memory usage WARNING [default] Aborting application. Reason: Fatal log at [/home/tf/ThunderSVM/thundersvm/src/thundersvm/thundersvm-train.cpp:116]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Xtra-Computing/thundersvm/issues/121, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxVs9r71sIuBXpTxBVRRElvFmEu9kFXks5vKiD0gaJpZM4akVjC .

naveenmiriyalu commented 5 years ago

Thanks for the clarification @shijiashuai