apoorvnandan / speech-recognition-primer

This repository contains code for a tutorial on end to end automatic speech recognition.
16 stars 6 forks source link

ctc loss gets nan #2

Open vishnuIgn opened 3 years ago

vishnuIgn commented 3 years ago

While training with custom dataset, after a few epochs, ctc loss got nan.

called epoch end, epoch num: 4 Epoch : 5 LOSS : 24.569025

called epoch end, epoch num: 5 Epoch : 6 LOSS : 20.572273

called epoch end, epoch num: 6 BREAK BREAKating train data 2/10 BREAKating train data 4/10 BREAKating train data 6/10 BREAKating train data 8/10 Epoch : 7ain data 10/10 LOSS : 29.43734

called epoch end, epoch num: 7 Epoch : 8 LOSS : nan

Loss got n a n, training suspends..

apoorvnandan commented 3 years ago

CTC loss is somewhat unstable in some cases. Unfortunately, I cannot determine the cause of this without looking at your exact setup and data. There are a number of adjustments that people use to avoid such issues. Two common ways are gradient clipping and warmup.

On a side note, I should probably extend this code and its post with a follow up about common training strategies for CTC based ASR networks.

In the meantime, if you can share a colab or a minimal reproducible snippet with this issue, I can help you debug it.