Open vishnuIgn opened 3 years ago
CTC loss is somewhat unstable in some cases. Unfortunately, I cannot determine the cause of this without looking at your exact setup and data. There are a number of adjustments that people use to avoid such issues. Two common ways are gradient clipping and warmup.
On a side note, I should probably extend this code and its post with a follow up about common training strategies for CTC based ASR networks.
In the meantime, if you can share a colab or a minimal reproducible snippet with this issue, I can help you debug it.
While training with custom dataset, after a few epochs, ctc loss got nan.
called epoch end, epoch num: 4 Epoch : 5 LOSS : 24.569025
called epoch end, epoch num: 5 Epoch : 6 LOSS : 20.572273
called epoch end, epoch num: 6 BREAK BREAKating train data 2/10 BREAKating train data 4/10 BREAKating train data 6/10 BREAKating train data 8/10 Epoch : 7ain data 10/10 LOSS : 29.43734
called epoch end, epoch num: 7 Epoch : 8 LOSS : nan
Loss got n a n, training suspends..