Gradient overflow - Githubissues

RiverMount commented 3 years ago

Hi！I use your code to achieve text2text task and use the BART configuration, but in the training process

gradient overflow always occurs in every epoch
I find all epoches (except the last one) cannot process all the steps, skipping the rest steps at every epoch

If you see, please reply me soon,Thanks!

jayded commented 3 years ago

Hi,

Apologies for not seeing this - please tag me directly in any issues. I reproduced these models after installing the requirements.txt - it is crucial that the versions specified are the ones we specify.

I ran my models out to 20 epochs, and (1) did not see any major difference in the val losses vs. 8 epochs, and (2) did not observe this behavior.

If you're still working with this, please verify that you have matched the requirements.txt file and try running to 8 epochs. Or download the models that we've released, see the ReadMe

Jay

jayded commented 3 years ago

Closing - this issue is stale. Please re-open if this becomes active again.

allenai / ms2

Gradient overflow #6