alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
882 stars 243 forks source link

Is it possible to flush and get the final result quicker? #199

Closed aj3423 closed 1 year ago

aj3423 commented 1 year ago

When test with some .wav file, send its content to docker "alphacep/kaldi-en" to get the result. The problem is that even the .wav duration is 0.7 second, it takes 2 seconds to get the final result.

And it takes 2.46 seconds with another file test16k.wav, which has a duration of 8 seconds. (files attached)

Files with different duration have similar recognition time, so I guess the 2.x seconds is not cost by the recognition but by waiting for the further voice input.

I tried to send the {"eof" : 1} right after the .wav data, but it changes nothing, my code looks like:

ws.WriteMessage(websocket.BinaryMessage, wav_binary) // send full .wav binary data to docker websocket
ws.WriteMessage(websocket.TextMessage, []byte(`{"eof" : 1}`))

_, msg, err := ws.ReadMessage() // <---- this takes 2+ seconds
check(err)

I want to write an application that when user press down space-key, it starts voice recognition, and stops when user release space-key. So when the key is released I want to send some signal to VOSK to tell it please stop waiting for further data and return the current result.

It it possible? Thanks.

Environment: Linux, Docker, GoLang CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz two_wav_file.zip

nshmyrev commented 1 year ago

Do you open a new connection for this short file of 0.7 second? There might be some initialization delays on start, we are working on them.

aj3423 commented 1 year ago

Sorry It's my bad, I allocated a large buffer and send too much trailing bytes, that caused the long recognition time.