baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.
https://huggingface.co/baichuan-inc/baichuan-7B
Apache License 2.0
5.67k stars 506 forks source link

[Question] DeepSpeed Zero3 save_checkpoint() got empty mode_states files #132

Open mynewstart opened 12 months ago

mynewstart commented 12 months ago

Required prerequisites

Questions

Hi, I used the code to continue pretrain the model and used zero3 for model training. But I found my checkpoint file zero_pprank*_mp_rank_00_model_states.pt is empty, the file only has model parameters name and shape, don't have the weights. Have you ever met this problem and how to fix?

Thanks!

Checklist

hmtbgc commented 11 months ago

I have met the same problem and my solution is to use deepspeed zero2 instead of zero3

mynewstart commented 11 months ago

My solution is to save checkpoints by myself or you can use zero_to_fp32

haorannlp commented 5 months ago

My solution is to save checkpoints by myself or you can use zero_to_fp32

@mynewstart I found my converted ckpt global_step_xxx only contains meaningful *optim_states.pt but only empty *model_states.pt. Any clues on this? Thanks.