ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 444 forks source link

Pretraining of ruGPT3Large issues #18

Closed Markfryazino closed 3 years ago

Markfryazino commented 3 years ago

I am totally confused about the details of GPT3 pretraining. First of all, file pretrain_ruGPT3Large.sh tries to run an unexisting python script pretrain_gpt2.py. However, in readme pretrain_gpt2.py is replaced with pretrain_megatron.py.

Secondly, there are some strange things about GPT3 checkpointing. The pretrain_megatron.py script operates model checkpoints as directories with .pt dumps, meanwhile, generate_ruGPT3Large.py takes as input a directory with model.bin, vocab.json, etc. Also, the directory of this format (and not .pt file) can be downloaded from Google Drive.

So, how should I finetune ruGPT3Large? It is obvious that just calling pretrain_megatron.py is not the correct way - at least because it starts from random due to mismatch of checkpoint files.

king-menin commented 3 years ago

Thank you! First was only our typos. It is fixed and renamed. Try set parameter --load-openai for loading transformers checkpoint if you want to fit model on megatron. you can also download model from hugginface transformers.