How to avoid CUDA out of memory error for large batch sizes?

ThilinaRajapakse / pytorch-transformers-classification

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.

Apache License 2.0

306 stars 97 forks source link

How to avoid CUDA out of memory error for large batch sizes? #18

Open phosseini opened 5 years ago

phosseini commented 5 years ago

I have two GPUs (2 x NVIDIA Tesla V100) and I'm running the codes in run_model.ipynb on Google Cloud. I get the CUDA out of memory exception when I want to run my code with a sequence length longer than 128 for greater batch sizes.

I wonder if I need to make any changes to the code to make it runnable using multiple GPUs? I think I shouldn't get the out of memory error considering the number of GPUs I have and their memory (please correct me if I'm wrong.)

ThilinaRajapakse commented 5 years ago

The code in this repo was not written to support multi-GPU training (mainly because I only have the one). But, the code that this is based on does support multi-GPUs. You should be able to get it to work with only a few changes.