bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.33k stars 215 forks source link

DeepSpeedCheckpoint needs to support bf16 optimizer states. #280

Open thomasw21 opened 2 years ago

thomasw21 commented 2 years ago

https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/a72225908e9bbda4d989bcdecd71c3c4a05a7f71/tools/convert_checkpoint/deepspeed_checkpoint.py#L5 seems wrong since the files generated using bf16 have bf16_zero_pp_rank as prefix.