Closed alipetiwala closed 4 years ago
Hm that is strange. Could you try using normal FlairEmbeddings
instead of the pooled embeddings? Does the same error occur?
Yes @alanakbik the same error occurs, however I was able to continue with following inferior changes:
WordEmbeddings('glove'),
# contextual string embeddings, forward
PooledFlairEmbeddings('news-forward', pooling='min',chars_per_chunk=64),
# contextual string embeddings, backward
PooledFlairEmbeddings('news-backward', pooling='min',chars_per_chunk=64),
]
I would not like to limit by chars_per_chunk=64 , chars_per_chunk=128 also fails.
Ah thanks for reporting this. Is there a specific reason you would like more chars per chunk? The chars_per_chunk
parameter does not affect the model accuracy, it is just a speed-memory tradeoff parameter.
Yes it slows down the process. May I know how much time it takes to complete say 150 epoch for a NER task using this pooled embeddings? Also it is generating very big models.
Hello Guys,
I am facing a similar runtime error while using the bert multi lingual embeddings. Please find my error snippet below :
My text data is in multiple languages (English, Chinese, Japanese, Korean) and I'm not sure if I can try any other pre-trained word embeddings for this. Would like to understand if you guys have a work around for this.
Hello @tsu3010 this likely happens because a mini-batch is pushed through the BERT model that requires too much GPU memory, i.e. there are too long texts in the dataset and the mini-batch size too large (see issue 549)
You could try reducing the mini-batch size from 32 to 8. You could filter or truncate long texts from the dataset to make it so a mini-batch fits into memory.
Thanks for the quick response Alan.
Tried a couple of things:
1) Reduced mini-batch(tried 8 and 4)
2) Filtered out long texts in the corpus based on token count using the following snippet,
max_tokens = 512
corpus._train = [x for x in corpus.train if len(x) < max_tokens]
corpus._dev = [x for x in corpus.dev if len(x) < max_tokens]
corpus._test = [x for x in corpus.test if len(x) < max_tokens]
suggestion in #387 by @stefan-it
These are my corpus stats after applying the filter,
{ "TRAIN": { "dataset": "TRAIN", "total_number_of_documents": 44473, "number_of_tokens_per_tag": {}, "number_of_tokens": { "total": 2593040, "min": 5, "max": 511, "avg": 58.3059384345558 } }, "TEST": { "dataset": "TEST", "total_number_of_documents": 9522, "number_of_tokens_per_tag": {}, "number_of_tokens": { "total": 555246, "min": 5, "max": 503, "avg": 58.311909262759926 } }, "DEV": { "dataset": "DEV", "total_number_of_documents": 9535, "number_of_tokens_per_tag": {}, "number_of_tokens": { "total": 550335, "min": 5, "max": 493, "avg": 57.717357105401156 } } }
Surprisingly this still leads me to the error raised in #392 due to sequence length despite filtering the corpus on token count.
Thanks for posting these results - could you try smaller values for max_tokens
in the filtering script? Just as a sanity check, could you run it with max_tokens=10
?
One problem is, that it is only possible to count "real" tokens. BERT uses subwords, so the number of subtokens is always higher/equal than the number of tokens. I would also recommend to use a smaller threshold :)
One problem is, that it is only possible to count "real" tokens. BERT uses subwords, so the number of subtokens is always higher/equal than the number of tokens. I would also recommend to use a smaller threshold :)
Good catch. Bert subwords were indeed the cause(set the max count at 512 earlier having not factoring this in!). Training works fine at small thresholds!
Cheers and Thanks for this really handy library :)
About the CUDA issues : Don't you think pytorch manages CUDA , it's not managed at at Flair level? Also GPU is known to get fragmented with pytorch. Thought a nudge might help you look at the other options.
I reduce batch_size to ...4 and it worked... :(
im building a dog breed classification model im getting same cuda out of memory error. im using tensorflow backend for the training and CNN
Hi, I had the same issue with 150 by 150 images. The error message makes no sense but when I made the batch size small (from 256 to 16) the issue resolved itself.
I have tried to drop_out=0.5
You need just to cleen cache. It worked for me:
import torch, gc gc.collect() torch.cuda.empty_cache()
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Getting CUDA out of memory errors for following:
I am getting the above error for this GPU config:
For the following training configuration: