alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8k stars 1.11k forks source link

Dividing ASR processing pipeline #783

Closed RafNie closed 2 years ago

RafNie commented 2 years ago

Hi

I'm wondering if would be possible to divide ASR processing pipeline. I see that the VOSK uses the kaldi::SingleUtteranceNnet3Decoder, which performs the neural network computing and the FST computing. There is also needed some CPU effort for feature and ivectors extraction. They are all processed by one core. I suppose that the AM and LM computations are most laborious. For small systems, like raspberry, it would be nice to be able to do this computation on separate cores. Is there a kaldi decoder that allows such operations in separate threads? Or maybe it would be relatively easy to modify an existing one? I do not have much knowledge of kaldi decoders, so I'm asking more experienced ones.

BR, Rafał

nshmyrev commented 2 years ago

We have issue #356 about this problem. In general, there are many open questions in design of such system. For us the first priority is a move to Pytorch which should make threaded design easier.