ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 444 forks source link

Generating with deepspeed checkpoints #56

Closed drunkinlove closed 3 years ago

drunkinlove commented 3 years ago

Hi! Following _Finetune_and_generate_RuGPTs_deepspeedmegatron.ipynb, I've finetuned a model using deepspeed. However, I can't use the checkpoint to generate a response, since generate_samples.py throws a KeyError: A metadata file exists but unable to load model from checkpoint /iter_0030000/mp_rank_00/model_optim_rng.pt, exiting The checkpoint's structure also looks different than a regular megatron checkpoint. What can I use to infer with a deepspeed checkpoint?

Pro100rus32 commented 3 years ago

Hi) Have you launched ruGPT3XL?

drunkinlove commented 3 years ago

@Pro100rus32 no, just the small version

king-menin commented 3 years ago

if you use deepspeed finetuning you will have deepspeed checkpoints. Try use key model or module for loading model state dict from deepspeed checkpoint