Open skx6 opened 3 weeks ago
Great work! I trained a 13B model, however, when I try to run the following codes, an error occurs:
python zero_to_fp32.py . ../pytorch_model.bin
The error is:
RuntimeError: Parent directory ../pytorch_model.bin does not exist.
Is the output of this step a file or directory? Is it a problem about deepspeed version or config?
That's because the version of deepspeed must be under or equal to 0.15.2. There is a change made to file zero_to_fp32.py in deepspeed0.15.3 which causes this problem.
Great work! I trained a 13B model, however, when I try to run the following codes, an error occurs:
The error is:
Is the output of this step a file or directory? Is it a problem about deepspeed version or config?