Open yang-ze-kang opened 2 days ago
I see an error in your code. When applying gradient accumulation, the 'train_loss.backward()' should be applied in each batch.
In fact, this code does not support gradient accumulation, and I did not use that technique. I've removed this part of the code in the code repository for now, thank you very much for your feedback.
I see an error in your code. When applying gradient accumulation, the 'train_loss.backward()' should be applied in each batch.