Closed kafan1986 closed 2 years ago
@kafan1986 did you mean if CTC decoder supports the streaming ASR? I have been working with the Conformer-CTC decoder and want to ask if it is possible to do streaming ASR with CTC decoder? @burchim would appreciate your comments about it.
Hi @harisgulzar1,
Streaming ASR is possible using CTC. There is currently no implementation of a streaming decoding function but this is absolutely possible !
Decoding can be performed chunk by chunk using the CTC encoder in convolution manner with a given context size and step size of audio frames given as hyper-parameters.
It is also possible to decode a streaming audio frame by frame with a model trained with causal context (masked future context in attention and convolutions). But this would require small changes in the implementation and configs. For now, all given configs train full context models.
Hi @burchim Thanks for your comment. That clarifies my doubt, I will try to implement it and see how it goes.
@harisgulzar1 did you get the time to implement and test the streaming ASR? How is the performance?
@debasish-mihup I haven't implemented it yet. But I will do it soon. In the meanwhile, I found this tutorial for building a streaming ASR pipeline. You may find it helpful. https://colab.research.google.com/github/pytorch/audio/blob/gh-pages/main/_downloads/bd34dff0656a1aa627d444a8d1a5957f/online_asr_tutorial.ipynb#scrollTo=joQ2X3uYAnfC
@harisgulzar1 I did take a look into the shared notebook. I have some doubts, are they only maintaining the context and using it during the decoder part of the pipeline or they are using context during the encoder phase as well?
I have implemented the inference code based on the above google collaboratory example. But the accuracy is very poor for small iterative chunks of audio. I think for streaming application, we need to retrain the models with zero future context, by setting the causal paramter to True in the ecoder.py file. Is my intuition correct about it?
@harisgulzar1 Did you get it to work reliably in streaming mode?
Hello @burchim
What all changes are required to support streaming inference?