allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2.05k stars 276 forks source link

Memory requirement changes after converting a model using create_long_model() function #184

Open taufique74 opened 3 years ago

taufique74 commented 3 years ago

I have tried converting roberta-base, distilroberta-base to longformer with the create_long_model() function in the given notebook convert_model_to_long.ipynb

The problem I'm facing is that it produces CUDA out of memory issue when I try to finetune it for token classification. I can't even fit a single batch with the converted model while I can fit more than one batch with allenai/longformer-base-4096, roberta-base with Tesla T4 which has 16GB memory. I also tried fp16 precision and gradient checkpointing, but the converted model always gives CUDA OOM issue regardless of the size.

Any hint where to look for solving this issue? I'm trying to train a token classifier with huggingface's run_ner.py script.

*My problem is somewhat similar to https://github.com/allenai/longformer/issues/81