Open gongye19 opened 6 months ago
Could you please share you env config and training command? I re-run the script and I do not have this issue.
Could you please share you env config and training command? I re-run the script and I do not have this issue.
deepspeed 用zero3 以及cpu offload会导致最后保存的问题,我cpo阶段换成zero2就可以正常保存了
parallel-sft训练完后保存的模型文件有问题,少了配置文件
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory