Closed kgarg8 closed 3 years ago
I also encountered this problem when i fine-tuned the longformer on a generated version of IMDB datasets. Here are the error codes:
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
.....
torch.cuda.OutOfMemoryError: CUDA out of memory.
I have set the max_length in my code:
tokenizer = LongformerTokenizerFast.from_pretrained('allenai/longformer-base-4096', max_length = 1024)
I suspect the tokenizer has bugs in handling some special words or characters.
I am using Longformer in the following way:
tokenized_input
is of the order ~5k-18k tokens but I am truncating to length 16384.GPU: 16GB V100 GPU memory
RAM:
Problem:
After training on ~5500-6000 batches of size 4, the process gets automatically killed by OOM signal produced.
An important observation that I see using
top
is that RAM was steadily and gradually increasing over time. It was around 80% when I last checked after around 3000 batches were processed.Note that the problem is not due to CUDA running out of memory most probably (since it already ran stably for 6-7 hours).
I tried testing with smaller sequence length like
10k
,5k
but it was the same problem still.I also tried to check if there is a memory leak in my code but doesn't seem like (to the best of my understanding).
Anyone faced similar issue before? Any directions to go from here?