Closed JohnHerry closed 2 years ago
Hi @JohnHerry ,
The problem is not with LJSpeech but custom data, right? In that case try lowering the learning rate. Higher LR leads to better models, but the training becomes unstable.
Hi @JohnHerry ,
The problem is not with LJSpeech but custom data, right? In that case try lowering the learning rate. Higher LR leads to better models, but the training becomes unstable.
Thank you for the help. I had tried to change the optimizer from FusedLAMB to FusedNovoGrad, and the training is running OK now. I am not sure whether it will work all the time.
About working all the time: with a higher LR you can get a bit better model, but some runs will fail. If you can afford it, I'd test a couple of different LRs, each with a few random seeds to get a feeling of what is safe.
Also, maybe some samples in your data are broken and cause large gradients, invoking the crash.
Thank you. The training is going on, I will try the result after that.
About working all the time: with a higher LR you can get a bit better model, but some runs will fail. If you can afford it, I'd test a couple of different LRs, each with a few random seeds to get a feeling of what is safe.
Also, maybe some samples in your data are broken and cause large gradients, invoking the crash.
I think the broken data should crash the training in the first epoch, while my training process broken after handards of epoches. We used pyworld.dio instead of librosa.pYIN as pitch estimator, that is the difference in the data preprocessing.
My try on optimizer of FusedNovoGrad proved to be a failure. The training is too slow. When I use FusedLAMB, I can get some resonable result on the 300 epoches, but with FusedNovoGrad, After the 4800 epoches it synthesis muffled audio.
I will try to reduce the learning rate, and get retried.
Related to FastPitch1.1/PyTorch
Describe the bug The training is broken by "loss is NaN", As FastPitch loss function is compused by mel_loss, duration loss, pitch loss, energy loss and attention loss, I had print the context , the mel_loss is NaN. and the gradient is Zero.
To Reproduce This happens, occasionally, in the 100 - 400 epoches, so It is not caused by Bad Input.
Expected behavior
My question is higher, I want to know how to design to avoid graidient vanish when using compund loss function as in FastPitch. problem in each part of subloss function can cause the training broken. And if that happens, how to check the problem and find the reason?
Environment Please provide at least: