NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 372 forks source link

ASpIRE dataset #471

Open flassTer opened 5 years ago

flassTer commented 5 years ago

Hello, I am trying to transfer learn the Jasper model with the same vocabulary trained on the LibriSpeech dataset (other+clean). It seems that the number of epochs is being calculated according to the number of global steps that have been completed. As a result, even though I trained the Jasper model for 14 epochs on Librispeech (that completed 123034 steps), when I run the ASpIRE dataset it displays epoch 1537. Additionally, it doesn't continue training and immediately finishes saying "Not enough steps for benchmarking".

My question is: What should I do so that I can continue training with the ASpIRE dataset?

blisc commented 5 years ago

In OpenSeq2Seq, you can specify either num_epochs or max_steps. You are correct that if you continue_learning, then the epoch calculation would be off, but I would advise you to try max_steps instead. I would also tune your learning rate scheduler to make sure that the learning rate is correct as you are continuing from a non-zero step.

flassTer commented 5 years ago

Thank you for the advice @blisc. What would you recommend for the learning rate scheduler?

blisc commented 4 years ago

Since learning rate scheduler is calculated based on current step number, I would just plot the learning rate on tensorboard to make sure that it is what you expect it to be.

IE most learning rates decay to some min number, just naively continuing without changing the steps in the scheduler will cause your job to train at this min lr instead of decaying or changing like you want it to