alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

When would we expect multiple "final" events? #41

Closed farmnerd closed 8 years ago

farmnerd commented 8 years ago

Hi,

In master_server.py, the send_event function, there's a check for if len(event["result"]["hypotheses"]) > 0 and event["result"]["final"]: and if so, append the first hypothesis's transcript to the eventual final_hyp. Why is this append necessary - if the result is marked as final, why wouldn't that just be used as the final hyp?

The real motive for the question is because I'm trying to return more than just the top transcript (e.g. n-best, or confidence scores) so I was thinking of having final_hyp just be event["result"]["hypotheses"] directly, and then build the response from that data. But if there's an underlying reason for the transcript appending functionality, then maybe that indicates my thinking is not right.

Thanks!

alumae commented 8 years ago

When worker is doing speech segmentation, there could be multiple final hypotheses.

Note that the send_event() function belongs to HttpChunkedRecognizeHandler that provides a simple HTTP POST based interface. A user could send a long audio file to that address, which could consist of many speech segments. The HttpChunkedRecognizeHandler sends the response back to the user once all the uploaded audio has been processed, i.e, it cannot send a new hypothesis every time we get a final hypothesis from the worker, but it has to append the final hyps and send them all out at the end.

farmnerd commented 8 years ago

I see - thanks!