[FastPitch/Pytorch] Warmup: learning rate increases than decreases. Is it normal?

NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

13.53k stars 3.23k forks source link

Hello. I'm trying to train new FastPitch model from scratch for another language and I've noticed that during warmup steps learning rate first increases from 3.16e-06 to around 2.70e-03 than starts decreasing, even before warmup steps are over. Is it how it's supposed to work? In the paper I've read that:

"Learning rate is increased during 1000 warmup steps, and then decayed according to the Transformer schedule".

And from what I see, when learning rate in increasing, validation loss in decreasing, but when lr starts decreasing again, validation loss stays more or less the same.

I use standard parameters for training: epochs - 1500 warmup-steps - 1000 (and lr start decreasing around 20 epochs) weight-decay - 1e-6 optimizer - lamb

Hi @juliakorovsky,

learning rate first increases from 3.16e-06 to around 2.70e-03 than starts decreasing, even before warmup steps are over

The peak learning rate on LJSpeech is around 3.16e-03 towards the end of the 21st epoch. Warmup is specified in the number of steps, and those will change when the size of your dataset changes. Try to adjust your warmup steps to hit a similar peak learning rate.

And from what I see, when learning rate in increasing, validation loss in decreasing, but when lr starts decreasing again, validation loss stays more or less the same.

It's normal, the L2 validation loss correlates poorly with the subjective quality of audio. For instance, it doesn't take into account misalignment in time.

NVIDIA / DeepLearningExamples

[FastPitch/Pytorch] Warmup: learning rate increases than decreases. Is it normal? #876