ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 444 forks source link

Experimenting with finetuning GPT2Large on Colab's V100 #25

Closed fen0s closed 3 years ago

fen0s commented 3 years ago

Welp, I've got it to finetune the model, but something seems off. When trying to generate anything with finetuned model, I'm getting the error about probability being zero, negative or infinity. It seems so because of block_size < 1024, or because of Apex O3 optimizaton level. Is there anything I can do to fix this? Seems like I'm so close to actually getting it to work but everytime something goes wrong :/ изображение

Config I use for training:

!cd ru-gpts && python pretrain_transformers.py \
    --output_dir=../checkpointss \
    --model_type=gpt2 \
    --model_name_or_path=../gpt2_large_bbpe_v50 \
    --do_train \
    --train_data_file=/content/dataset.txt \
    --fp16 \
    --fp16_opt_level O3 \
    --per_gpu_train_batch_size 1 \
    --num_train_epochs 2 \
    --block_size=768 \
    --overwrite_output_dir \

Considering others succesfully trained large and even XL on colab gpu's, I think it is actually possible, and has drastic difference in quality comparing with GPT-3 Small, that is offered for finetuning.

king-menin commented 3 years ago

try to decrease fp16_opt_level

fen0s commented 3 years ago

try to decrease fp16_opt_level

Yeah, the problem here is that with O2 it still doesn't want to run training and goes OOM, even at lesser block sizes. Weird thing here is that gradient checkpointing is present in your model implementation, but somehow it doesn't optimize model enough for greater order models to be ran on less VRAM hardware, which is really weird... Still trying out different things to make it work.

Dmitriuso commented 3 years ago

Hey guys, @fen0s I'm trying to do the same thing - fine-tune GPT2 Large model in Colab, but I run out of RAM as well as you do. I tried to resolve this issue with a loop while True, but it doesn't seem to work either. Did you manage to find something to make it work? 🤞

king-menin commented 3 years ago

try use deepspeed version of script