Open zkysss11235 opened 6 months ago
When I run the first stage dpo training, I get the issue in the title when the model is doing backpropagation. I use the POVID/scripts/run_dpo.sh script and only change the path of different models and datasets. This is the line that produced this error:
self.accelerator.backward(loss)
. Can anyone help me with this? Many thanks.
Hello, could you please provide more detailed information? I suspect the current error might be due to library version issues.
Hello, thanks a lot for releasing the source code and dataset. Having the same issue for training stage 1, using the same library versions specified.
Changing from zero2 to zero3 fixed it.
Can you please confirm if you use zero2/zero3/zero3_offload for training ?
Same issue
Same issue
When I run the first stage dpo training, I get the issue in the title when the model is doing backpropagation. I use the POVID/scripts/run_dpo.sh script and only change the path of different models and datasets. This is the line that produced this error:
self.accelerator.backward(loss)
. Can anyone help me with this? Many thanks.