alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
918 stars 248 forks source link

The intended way to inform clients about the end of recognition #81

Closed sashker closed 3 years ago

sashker commented 3 years ago

Hello.

It's not quite clear from the source code of the server and clients how we should know that the server is not going to send us something more? We send to the server {"eof" : 1} when there are no more audio chunks, but nothing similar from the server side.

Is it a connection closing we should process with status code 1000 in order to be clear in this?

Thanks in advance.

nshmyrev commented 3 years ago

Server always send the response to the messages, 1 client message 1 response, more like RPC. If you sent eof it will send one response and thats it.

sskorol commented 3 years ago

@sashker if you continuously stream audio from the client, the only question you should care about is if you received an interim or final transcribe, and is it empty or not. In general, if you see a result key in the response payload, it means you received a final transcribe (which might be treated as a logical pause). That's basically it. The server can't predict if a client is going to send more chunks for transcribing. So if your connection is still open, the response payload is a single truth.

sashker commented 3 years ago

Thank you @sskorol and @nshmyrev for helping me. I managed to fix my client so it works now. I think that the example golang client doesn't work correctly at the time. I'll try to fix it and make a PR.