NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

WER stuck at 0.3 when fine tuned with vaccum noise #858

Closed bagavi closed 3 years ago

bagavi commented 4 years ago

Goal: Fine tuning Quartznet 15x5 with test_clean_100 plus vacuum noise (10db snr).

Setting: I am using speech2text.py script to train with a single small GPU with batch size=8, lr = 1e-4, warm_ratio=0.02, num_epochs = 200 and rest of the parameters are default. I have attached the screen shot of WandB board. weights and biases board

Problem: The WER is stuck to 0.3 (with noise) and 0.2 (without noise) from epoch 4 to 14.

Questions:

  1. Is this expected since I am using only batch size of 8 and its only run for 14 epochs.
  2. Should I reduce my number of epochs from 200 to 10 or 20?
  3. Should I reduce my learning rate to account for batch size of 8?

Is it possible to share the WandB boards for the fine-tuning from your latest papers?

Thanks in advance and congratulations for this awesome work :)

okuchaiev commented 4 years ago

I'd recommend you increase warmup_ratio to 0.12. Also, your batch size is small - we never tried such small bs, but this means that you have to use smaller bs than in our experiments.

bagavi commented 4 years ago

Could you clarify “you have to use smaller bs than in our experiment”. Do you mean smaller lr(learning rate)?

Also, what is the reason behind .12 warmup ratio for fine tuning vs .02 warmup ratio for training from scratch?

Thanks!