alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

calculate and return hypothesis confidence number #42

Open farmnerd opened 8 years ago

farmnerd commented 8 years ago

This PR is a proposal for including a confidence number in the HTTP JSON response. The confidence algorithm is from some commits to the sample full postprocessor: a hypothesis whose kaldi likelihood is relatively much higher than the next hypothesis's likelihood gets a higher confidence number; hypotheses whose likelihoods are closer together get a lower confidence number.

A couple points to notice when considering this PR:

Comments or suggestions welcome - thanks!

alumae commented 8 years ago

Sorry for not checking out this PR earlier. This seems like a good idea. However, it seems to me that the confidence for the mult-segment hyp should be the product of the confidences of the individual segments. Confidences are like probabilities, and when you combine the probabilities of multiple events, you need to multiply their probabilities. Or do you have any other viewpoint, perhaps from a practical perspective. Of course, this would mean that the confidence of long multi-segment utterance will be very small, but it seems to me that this reflects the reality (after all, you cannot be so sure that at least one word is not correct).