Open xiaoli1996 opened 1 year ago
hi there,
The nan
error can be due to an overflow/underflow - it is hard for me to identify the exact reason. It might be related to pytorch and hardware.
You could try two workarounds:
-Yuan
Thanks for the suggestion, I will run it with a lower version of torch.
Hi! Yaun Gong, Great job! I use the same hyperparameter by your GitHub code but when training "Epoch: [4][160156/161048]" appears "Train Loss is nan".
The results of the 3 epochs are: 0.415, 0.439, 0,447, Compare the results given in your log: 0.415, 0.439, 0,448, 0.449, 0.449
My torch version is 2.0.0, So why does this happen?