bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.31k stars 213 forks source link

Load Bloom Optimizer State (i.e. Bloom 1B1) #350

Open philippmtk opened 1 year ago

philippmtk commented 1 year ago

Hi,

I want to continue training of the Bloom model. To start simple, I want to load the 1.1B model into the BigScience Megatron-DeepSpeed library.

I tried to run pretrain_gpt.py with the argument "load" set to the path of the 1b1 optimizer state (from here https://huggingface.co/bigscience/bloom-1b1-optimizer-states/tree/main/global_step660750)

It is complaining that there is no meta file available:

error_load_bloom1b1

Before I continue to debug or make changes to this library, I was just wondering whether there is a better way/already implemented way to load the optimizer state.

Best wishes Philipp

heraclex12 commented 1 year ago

I think you should create manually a latest file that contains the checkpoint folder name i.g., global_step660750

hatvn commented 1 year ago

I think you should create manually a latest file that contains the checkpoint folder name i.g., global_step660750

Have you tried it and can you tell me exactly the content of your manually created latest?