NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

[Question] Normalization value #1203

Closed khursani8 closed 4 years ago

khursani8 commented 4 years ago

I'm following this tutorial 02_Online_ASR_Microphone_Demo and wondering where normalization constant coming from, if I'm using different dataset should I change it to other value?

titu1994 commented 4 years ago

Sorry for the delay, wanted to polish up the notebook.

The normalization constant is computed for a specific dataset, so if the model has been trained on a different dataset then the constant values need to be changed.

The global mean and std vectors can be computed for a given normalization scheme ("per_feature", "all_features") by using the following notebook.

You will note that the constants are slightly different for just Librispeech vs what the notebook displays. This is because the QuartzNet model was trained on a combination of Librispeech as well as other datasets, so the normalization constants are slightly different.

khursani8 commented 4 years ago

Thanks, the notebook really helpful for me :)