Closed amandaluof closed 9 months ago
This may come from gradient checkpointing.
Do you have gradient_checkpointing during stage 1 training? We turned it off, and perhaps some workarounds are needed here to avoid errors when gradient_checkpointing is enabled. We haven't investigated it thoroughly yet
workarounds
Yes, I tried to turn on gradient_checkpointing. Thanks for your reply.
Many thanks for releasing the training code.
However, when following the environment setting as well as data preparation and then running the command of stage 1 training, I got the error in the following screenshot. Is there anything wrong?
Looking forward to your reply. Thank you again !