allenai / ms2

Apache License 2.0
64 stars 12 forks source link

Gradient overflow #6

Closed RiverMount closed 3 years ago

RiverMount commented 3 years ago

Hi!I use your code to achieve text2text task and use the BART configuration, but in the training process

image

If you see, please reply me soon,Thanks!

jayded commented 3 years ago

Hi,

Apologies for not seeing this - please tag me directly in any issues. I reproduced these models after installing the requirements.txt - it is crucial that the versions specified are the ones we specify.

I ran my models out to 20 epochs, and (1) did not see any major difference in the val losses vs. 8 epochs, and (2) did not observe this behavior.

If you're still working with this, please verify that you have matched the requirements.txt file and try running to 8 epochs. Or download the models that we've released, see the ReadMe

Jay

jayded commented 3 years ago

Closing - this issue is stale. Please re-open if this becomes active again.