NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
10.79k stars 2.26k forks source link

Unable to reproduce cache aware streaming results for Conformer that were there for Fastconformer. #9495

Closed mujhenahiata closed 5 days ago

mujhenahiata commented 1 week ago

I trained Conformer-CTC using cache aware method. but unable to reproduce the results that were there with "Fastconformer cache aware streaming". By using Conformer-CTC cache aware method i get partial sub-words, words combined together like ("hello world" ==> herld) , in streaming output.

the same was the case when i deployed it in a RIVA pipeline.

but when i use the "transcribe" function i get proper transcription for the audio file. @titu1994 @VahidooX can you please enlighten on this. When to use and how to use streaming models in .nemo format and in RIVA pipeline. My question has some relevance with these issues [https://github.com/NVIDIA/NeMo/discussions/7010] [https://github.com/NVIDIA/NeMo/discussions/5284]