Torch takes almost all the memory even on large GPU

I tried to run code and pretrained network provided in this notebook, https://colab.research.google.com/github/sberbank-ai/ru-gpts/blob/master/examples/Finetune_RuGPTs_with_HF.ipynb but I can't run more then one batch per GPU with my data. It is not very big - 90 MB, but it takes forever to do with 1 batch per step.

So, every time I run this command

!CUDA_VISIBLE_DEVICES=0 python ru-gpts/pretrain_transformers.py \
    --output_dir=models/essays \
    --model_type=gpt2 \
    --model_name_or_path=sberbank-ai/rugpt3small_based_on_gpt2 \
    --do_train \
    --train_data_file=train.txt \
    --do_eval \
    --eval_data_file=valid.txt \
    --per_gpu_train_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --num_train_epochs 5 \
    --block_size 2048 \
    --overwrite_output_dir

with --per_gpu_train_batch_size > 1 I get an error RuntimeError: CUDA out of memory.. The error shows that >90% of memory is allocated to torch and the rest is not enough to run multiple batches. And this happens on GPUs with any amount of memory: 10, 15, 30 GB.

Could you please fix the amount of memory preallocated to pytorch.

ai-forever / ru-gpts

Torch takes almost all the memory even on large GPU #57