Xtra-Computing / thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs
Apache License 2.0
1.56k stars 216 forks source link

Fatal out of memory error #151

Open aaroncaffrey opened 5 years ago

aaroncaffrey commented 5 years ago

I am running several instances of thundersvm GPU based cross-validation and training in parallel and sequentially. However, on occasion, I get a fatal out of memory error. In which case, I have a fail-safe to delay and then rerun the task, which is fine. However, please, if possible, I would like to better understand the issues and also suggest improvements.

  1. Please would it be possible to specify whether this error message refers to the GPU memory, system memory, or which type of memory exactly? Either here and/or within the error messages if it can be both GPU and system memory.

  2. Does the command line '-m memory size' parameter refer to GPU memory or system memory? Is it possible to restrict the usage of both types of memory? If so, could this be a parameter? Does restricting the memory size have any negative consequences apart from longer training times (i.e. can it affect classification metric performance)?

Thank you.

zeyiwen commented 5 years ago

Thanks!

  1. As we can't reproduce your problem, my best guess is "GPU memory". We will improve the code to output the memory type.
  2. In your setting, it refers to GPU memory as you have GPUs. As thundersvm also works purely on CPUs, the option can also mean setting system/host memory.

It is possible to restrict GPU memory consumption in your environment. However, constraining the host memory is very challenging in your case. This is because the host memory consumption is mainly due to storing the training data. Partially storing the training data on host memory would be extremely inefficient.

Restricting the memory consumption has no negative impact on the quality of the trained model. So, feel free to restrict memory consumption if longer training time is acceptable :)