alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
882 stars 243 forks source link

How to get result immediately? #203

Closed aj3423 closed 2 months ago

aj3423 commented 1 year ago

I understand the server waits for more input data after user done speaking. But in my case a user says a lot of short sentences or words, some words are especially slow, such as "underscore".

I tested with the model vosk-model-en-us-0.22-lgraph(128M)

I tried to send a chunk of 40k binary zeroes to server, I think maybe the server considers it as a five-seconds-silence, then it may terminates the waiting and return the result immediately. It does work for the 200 phrase_list, with the zeroes, it waits less than 500ms, comapred to previously 1+ second.

But it doesn't work for the full phrase_list mode , it still takes 10+ seconds. I notice that when it is calculating the "underscore", one of my CPU core keeps at 80%

I have two questions:

  1. Why this word "underscore" takes so long?
  2. Is there a built-in way to stop waiting?

BTW, I made a tool that demostrates how to capture sound from microphone and send to VOSK server, it can also play back and save input to a .wav file for troubleshooting, written in Golang and cross platform. In case anyone wants an interactive demo: https://github.com/aj3423/vosk-sound-test/

And I'm sure there is no sound quality problem in my case..