flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.88k stars 2.1k forks source link

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.90 GiB total capacity; 14.66 GiB already allocated; .... #685

Closed alipetiwala closed 4 years ago

alipetiwala commented 5 years ago

Getting CUDA out of memory errors for following:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-75d7ff2333b9> in <module>
     10               learning_rate=0.1,
     11 #               mini_batch_size=16,
---> 12               max_epochs=150,checkpoint=True,anneal_with_restarts=True,embeddings_in_memory=False)
     13 
     14 # 8. plot training curves (optional)

/opt/conda/lib/python3.6/site-packages/flair/trainers/trainer.py in train(self, base_path, evaluation_metric, learning_rate, mini_batch_size, eval_mini_batch_size, max_epochs, anneal_factor, patience, anneal_against_train_loss, train_with_dev, monitor_train, embeddings_in_memory, checkpoint, save_final_model, anneal_with_restarts, test_mode, param_selection_mode, **kwargs)
    148 
    149                 for batch_no, batch in enumerate(batches):
--> 150                     loss = self.model.forward_loss(batch)
    151 
    152                     optimizer.zero_grad()

/opt/conda/lib/python3.6/site-packages/flair/models/sequence_tagger_model.py in forward_loss(self, sentences, sort)
    261     def forward_loss(self, sentences: Union[List[Sentence], Sentence], sort=True) -> torch.tensor:
    262         features, lengths, tags = self.forward(sentences, sort=sort)
--> 263         return self._calculate_loss(features, lengths, tags)
    264 
    265     def forward_labels_and_loss(self, sentences: Union[List[Sentence], Sentence],

/opt/conda/lib/python3.6/site-packages/flair/models/sequence_tagger_model.py in _calculate_loss(self, features, lengths, tags)
    409             tags, _ = pad_tensors(tags)
    410 
--> 411             forward_score = self._forward_alg(features, lengths)
    412             gold_score = self._score_sentence(features, tags, lengths)
    413 

/opt/conda/lib/python3.6/site-packages/flair/models/sequence_tagger_model.py in _forward_alg(self, feats, lens_)
    520             agg_ = torch.log(torch.sum(torch.exp(tag_var), dim=2))
    521 
--> 522             cloned = forward_var.clone()
    523             cloned[:, i + 1, :] = max_tag_var + agg_
    524 

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.90 GiB total capacity; 14.66 GiB already allocated; 7.88 MiB free; 559.48 MiB cached)

I am getting the above error for this GPU config:


+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      6705      C   /opt/conda/bin/python                      16263MiB |
+-----------------------------------------------------------------------------+
Thu Apr 25 10:14:11 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04    Driver Version: 418.40.04    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   55C    P0    38W / 250W |  16273MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      6705      C   /opt/conda/bin/python                      16263MiB |
+-----------------------------------------------------------------------------+

For the following training configuration:


embedding_types: List[TokenEmbeddings] = [

    # GloVe embeddings
    WordEmbeddings('glove'),

    # contextual string embeddings, forward
    PooledFlairEmbeddings('news-forward', pooling='min'),

    # contextual string embeddings, backward
    PooledFlairEmbeddings('news-backward', pooling='min'),
]

embeddings: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger

print("initialize sequence tagger")
from flair.models import SequenceTagger

tagger: SequenceTagger = SequenceTagger(hidden_size=256,
                                        embeddings=embeddings,
                                        tag_dictionary=tag_dictionary,
                                        tag_type=tag_type,
                                        use_crf=True)
trainer.train('resources/taggers/prep/pooled',
              learning_rate=0.1,
              mini_batch_size=16,
              max_epochs=150,checkpoint=True,anneal_with_restarts=True,embeddings_in_memory=False)
alanakbik commented 5 years ago

Hm that is strange. Could you try using normal FlairEmbeddings instead of the pooled embeddings? Does the same error occur?

alipetiwala commented 5 years ago

Yes @alanakbik the same error occurs, however I was able to continue with following inferior changes:


WordEmbeddings('glove'),

    # contextual string embeddings, forward
    PooledFlairEmbeddings('news-forward', pooling='min',chars_per_chunk=64),

    # contextual string embeddings, backward
    PooledFlairEmbeddings('news-backward', pooling='min',chars_per_chunk=64),
]

I would not like to limit by chars_per_chunk=64 , chars_per_chunk=128 also fails.

alanakbik commented 5 years ago

Ah thanks for reporting this. Is there a specific reason you would like more chars per chunk? The chars_per_chunk parameter does not affect the model accuracy, it is just a speed-memory tradeoff parameter.

alipetiwala commented 5 years ago

Yes it slows down the process. May I know how much time it takes to complete say 150 epoch for a NER task using this pooled embeddings? Also it is generating very big models.

tsu3010 commented 5 years ago

Hello Guys,

I am facing a similar runtime error while using the bert multi lingual embeddings. Please find my error snippet below :

Screenshot 2019-05-08 at 5 02 17 PM

My text data is in multiple languages (English, Chinese, Japanese, Korean) and I'm not sure if I can try any other pre-trained word embeddings for this. Would like to understand if you guys have a work around for this.

alanakbik commented 5 years ago

Hello @tsu3010 this likely happens because a mini-batch is pushed through the BERT model that requires too much GPU memory, i.e. there are too long texts in the dataset and the mini-batch size too large (see issue 549)

You could try reducing the mini-batch size from 32 to 8. You could filter or truncate long texts from the dataset to make it so a mini-batch fits into memory.

tsu3010 commented 5 years ago

Thanks for the quick response Alan.

Tried a couple of things: 1) Reduced mini-batch(tried 8 and 4) 2) Filtered out long texts in the corpus based on token count using the following snippet, max_tokens = 512 corpus._train = [x for x in corpus.train if len(x) < max_tokens] corpus._dev = [x for x in corpus.dev if len(x) < max_tokens] corpus._test = [x for x in corpus.test if len(x) < max_tokens] suggestion in #387 by @stefan-it

These are my corpus stats after applying the filter,

{ "TRAIN": { "dataset": "TRAIN", "total_number_of_documents": 44473, "number_of_tokens_per_tag": {}, "number_of_tokens": { "total": 2593040, "min": 5, "max": 511, "avg": 58.3059384345558 } }, "TEST": { "dataset": "TEST", "total_number_of_documents": 9522, "number_of_tokens_per_tag": {}, "number_of_tokens": { "total": 555246, "min": 5, "max": 503, "avg": 58.311909262759926 } }, "DEV": { "dataset": "DEV", "total_number_of_documents": 9535, "number_of_tokens_per_tag": {}, "number_of_tokens": { "total": 550335, "min": 5, "max": 493, "avg": 57.717357105401156 } } }

Surprisingly this still leads me to the error raised in #392 due to sequence length despite filtering the corpus on token count.

alanakbik commented 5 years ago

Thanks for posting these results - could you try smaller values for max_tokens in the filtering script? Just as a sanity check, could you run it with max_tokens=10?

stefan-it commented 5 years ago

One problem is, that it is only possible to count "real" tokens. BERT uses subwords, so the number of subtokens is always higher/equal than the number of tokens. I would also recommend to use a smaller threshold :)

tsu3010 commented 5 years ago

One problem is, that it is only possible to count "real" tokens. BERT uses subwords, so the number of subtokens is always higher/equal than the number of tokens. I would also recommend to use a smaller threshold :)

Good catch. Bert subwords were indeed the cause(set the max count at 512 earlier having not factoring this in!). Training works fine at small thresholds!

Cheers and Thanks for this really handy library :)

prabhatM commented 5 years ago

About the CUDA issues : Don't you think pytorch manages CUDA , it's not managed at at Flair level? Also GPU is known to get fragmented with pytorch. Thought a nudge might help you look at the other options.

thusinh1969 commented 5 years ago

I reduce batch_size to ...4 and it worked... :(

Naveenyadav1 commented 5 years ago

im building a dog breed classification model im getting same cuda out of memory error. im using tensorflow backend for the training and CNN

Gaussian01 commented 4 years ago

Hi, I had the same issue with 150 by 150 images. The error message makes no sense but when I made the batch size small (from 256 to 16) the issue resolved itself.

v-nhandt21 commented 4 years ago

I have tried to drop_out=0.5

ddvika commented 4 years ago

You need just to cleen cache. It worked for me:

import torch, gc gc.collect() torch.cuda.empty_cache()

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.