LongformerEncoderDecoder overshooting RAM: triggered OOM after training stably for 6-7 hours

I am using Longformer in the following way:

from transformers.models.led.modeling_led import LEDForConditionalGeneration
from transformers.models.led.tokenization_led import LEDTokenizer

tokenizer = LEDTokenizer.from_pretrained(config["embedding_path"])
tokenized_input = tokenizer.encode(input, max_length=16384)

model = LEDForConditionalGeneration.from_pretrained(embedding_path, gradient_checkpointing=True, return_dict=True)

if not config['generate']:
    outputs = model(input_ids=tokenized_input,
                         labels=...,
                         use_cache=False, # for gradient_checkpointing flag
                         attention_mask=...,
                         decoder_attention_mask=...)

tokenized_input is of the order ~5k-18k tokens but I am truncating to length 16384.

GPU: 16GB V100 GPU memory

RAM:

              total        used        free      shared  buff/cache   available
Mem:      59G         39G         10G         40M        9.4G         19G
Swap:        0B          0B          0B

Problem:

After training on ~5500-6000 batches of size 4, the process gets automatically killed by OOM signal produced.

An important observation that I see using top is that RAM was steadily and gradually increasing over time. It was around 80% when I last checked after around 3000 batches were processed.

Note that the problem is not due to CUDA running out of memory most probably (since it already ran stably for 6-7 hours).

I tried testing with smaller sequence length like 10k, 5k but it was the same problem still.

I also tried to check if there is a memory leak in my code but doesn't seem like (to the best of my understanding).

Anyone faced similar issue before? Any directions to go from here?

allenai / longformer

LongformerEncoderDecoder overshooting RAM: triggered OOM after training stably for 6-7 hours #204