Closed gabitza-tech closed 1 year ago
@gabitza-tech did you find any solution to your error?
Hi @nabil6391 not exactly, but I don't face it as often anymore. It might have been caused by the learning rate and low precision (16bit). From what I experimented, when using 16bf or 32 bit precision, I can't reproduce the error. When using 16bit precision, training is usually not very stable..
BR, Gabi
Thank you, Your suggestion might just help solve the same issue I am facing. Will try wtih "bf16-mixed"
Describe the bug
I am using nemo docker v23.04 , but I also observe the following behavior in a conda environment too.
I am training several conformer models ( same behavior using fastconformer transducer or conformer transducer) with several perturbations ( speed and noise from musan) and at the end of the epoch, when validation is performed, i observe the following behavior:
If I don't use augmentations, it shows the predictions and works ok. The val_loss is NaN and the val_wer displays a random value.
I think it might be a problem with my noise manifest, but I am unsure why, when I was augmenting VAD audios, I remember making the segments equal to 0.63s, but I thought in case of ASR, that perturb automatically cuts the noise audios. It looks like this:
This is my config file:
Thanks in advance!