Open misska1 opened 2 years ago
You can convert the HF checkpoints back to Megatron-DeepSpeed. See this (a bit hacky) script: https://gist.github.com/malteos/c194368594e16439c101b7bf27195fd1
You can convert the HF checkpoints back to Megatron-DeepSpeed. See this (a bit hacky) script: https://gist.github.com/malteos/c194368594e16439c101b7bf27195fd1
@malteos Thank you for your answer! However , in your code ,I need to specify a DeepSpeed checkpoint.
"checkpoint_dir",
type=str,
help="Path to the DeepSpeed checkpoint directory",
But I do not have a DeepSpeed checkpoint for (6.3B,2.5B,1.3B) to compare.
The script updates the weights of Deepspeed checkpoint directly on the disk with the weights from a HF checkpoint. So you just need to save an untrained DS checkpoint and update it afterwards. You can use the existing slurm scripts for that and only set the number of train steps to 1.
Does BigScience also provide the original BLOOM checkpoints (without conversion to Huggingface 🤗). I am working on finetuning BLOOM (6.3B,2.5B,1.3B) and I need those checkpoint files. issues/315
In https://github.com/bigscience-workshop/bigscience/tree/master/train/tr1-13B-base ,I found some urls but they are all offline.