NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
10.79k stars 2.26k forks source link

LM on Parakeet models #9500

Open jinmingteo opened 1 week ago

jinmingteo commented 1 week ago

Hi team,

I am trying to add LM into parakeet models and i think this is the script related to it.

eval_beamsearch_ngram_transducer.py

I have tried using it and encountered a couple of issues with EncDecRNNTBPEModel. For example, there are 3 outputs in forward function now. Is there other scripts that i should be looking into?

Thanks

titu1994 commented 1 week ago

For transducer, there were always 3 outputs in forward because it's just end encoder forward. During decoding we take those encoder outputs and pass them auto regressively to the decoder and joint.

For beam search support, @karpnv