NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.34k stars 2.55k forks source link

Initialize a Parakeet Cache-Aware Streaming model's encoder from an offline model. #11250

Open gabitza-tech opened 3 weeks ago

gabitza-tech commented 3 weeks ago

Hello,

Thank you for all your amazing work!

At the moment, nemo only support the fastconformer 120M streaming model, however I would like to also try a bigger model. I was wondering if it is possible to modify the architecture to match de 600M parakeet model and initialize the encoder weights from the offline parakeet model? Otherwise, training a bare 600M streaming model without any SSL pretraining would probably be worse and slower to train than the 120M model.

Best regards, Gabi

sandergs92 commented 13 hours ago

See the following answer: https://github.com/NVIDIA/NeMo/issues/9615#issuecomment-2264136385