Open philippmtk opened 2 years ago
I think you should create manually a latest
file that contains the checkpoint folder name i.g., global_step660750
I think you should create manually a
latest
file that contains the checkpoint folder name i.g.,global_step660750
Have you tried it and can you tell me exactly the content of your manually created latest
?
Hi,
I want to continue training of the Bloom model. To start simple, I want to load the 1.1B model into the BigScience Megatron-DeepSpeed library.
I tried to run pretrain_gpt.py with the argument "load" set to the path of the 1b1 optimizer state (from here https://huggingface.co/bigscience/bloom-1b1-optimizer-states/tree/main/global_step660750)
It is complaining that there is no meta file available:
Before I continue to debug or make changes to this library, I was just wondering whether there is a better way/already implemented way to load the optimizer state.
Best wishes Philipp