NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.83k stars 2.46k forks source link

Different results between Validation at training time and inference #441

Closed Giuseppe5 closed 4 years ago

Giuseppe5 commented 4 years ago

Hello,

I'm currently doing a fine tuning training of Quartznet. At the end of the training process, it computes the WER on the evaluation set. However, if then I try just to load the checkpoints and perform only the evaluation without any training, the WER obtained is slightly different from the one obtained during the training phase.

Is there any reason for this?

Thanks for your help!

vsl9 commented 4 years ago

Thank you for the question! The reason behind this difference in WER is that we add small amount of noise during training (dithering). To get exactly the same predictions during inference, please set dithering gain factor to zero either in code with model_definition['AudioToMelSpectrogramPreprocessor']['dither'] = 0 or in YAML config file: https://github.com/NVIDIA/NeMo/blob/7c3081c4fa94e962507d47d5ec652f62dc10894f/examples/asr/configs/quartznet15x5.yaml#L22

Giuseppe5 commented 4 years ago

Thanks for the info!

owos commented 5 months ago

HI @vsl9 I have a similar issue but my performance gets worst after loading the model for inference when I am done finetuning. Is there a reason & solution to make the performance better during inference?