alumae / gst-kaldi-nnet2-online

GStreamer plugin around Kaldi's online neural network decoder
Apache License 2.0
185 stars 100 forks source link

Real-time recognition #77

Closed LqNoob closed 5 years ago

LqNoob commented 5 years ago

Sometimes we may wait a few seconds for ASR system to process a piece of audio. How did you solve this situation? How to improve the speed of recognition?

The question is "How did you get the results immediately as user stopped speaking?"

alumae commented 5 years ago

I guess some of the factors that have an impact on the decoding latency are audio throughput latency (how fast the audio arrives to the server) and speech quality: clear speech with little background noise is just faster to decode (because of pruning).

LqNoob commented 5 years ago

Thank you for your reply! By testing kaldi's online decoding(online2-wav-nnet2-latgen-faster.cc), it is found that a large part of the time is wasted on reading and processing the audio that have chunk length size. Now, I don't know how to deal with this problem.

zhaochao2504 commented 5 years ago

I just started studying kaldi, and see the project on GitHub. I want to know if the project gst-kaldi-nnet2-online only support the ivector feature, Is the input feature only mfcc/fbank, otherwise it won't work, or If I use other features, just change the configuration file

alumae commented 5 years ago

It does support i-vectors. Only mfcc/fbank features are supported.

zhaochao2504 commented 5 years ago

thank you very much