I have tried converting roberta-base, distilroberta-base to longformer with the create_long_model() function in the given notebook convert_model_to_long.ipynb
The problem I'm facing is that it produces CUDA out of memory issue when I try to finetune it for token classification. I can't even fit a single batch with the converted model while I can fit more than one batch with allenai/longformer-base-4096, roberta-base with Tesla T4 which has 16GB memory. I also tried fp16 precision and gradient checkpointing, but the converted model always gives CUDA OOM issue regardless of the size.
Any hint where to look for solving this issue?
I'm trying to train a token classifier with huggingface's run_ner.py script.
I have tried converting
roberta-base
,distilroberta-base
tolongformer
with thecreate_long_model()
function in the given notebookconvert_model_to_long.ipynb
The problem I'm facing is that it produces CUDA out of memory issue when I try to finetune it for token classification. I can't even fit a single batch with the converted model while I can fit more than one batch with
allenai/longformer-base-4096
,roberta-base
with Tesla T4 which has 16GB memory. I also tried fp16 precision and gradient checkpointing, but the converted model always gives CUDA OOM issue regardless of the size.Any hint where to look for solving this issue? I'm trying to train a token classifier with huggingface's run_ner.py script.
*My problem is somewhat similar to https://github.com/allenai/longformer/issues/81