Memory requirement changes after converting a model using create_long_model() function

I have tried converting roberta-base, distilroberta-base to longformer with the create_long_model() function in the given notebook convert_model_to_long.ipynb

The problem I'm facing is that it produces CUDA out of memory issue when I try to finetune it for token classification. I can't even fit a single batch with the converted model while I can fit more than one batch with allenai/longformer-base-4096, roberta-base with Tesla T4 which has 16GB memory. I also tried fp16 precision and gradient checkpointing, but the converted model always gives CUDA OOM issue regardless of the size.

Any hint where to look for solving this issue? I'm trying to train a token classifier with huggingface's run_ner.py script.

*My problem is somewhat similar to https://github.com/allenai/longformer/issues/81

allenai / longformer

Memory requirement changes after converting a model using create_long_model() function #184