alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

different results for same input #33

Closed mosherayman closed 8 years ago

mosherayman commented 8 years ago

Hi

When using the client.py script repeatedly on the same wavefile, i get different results.

Most of the time the difference is in the likelihood field of the returned hypotheses, they vary a little from one run to the next (on the same decoder)

Occasionally, the likelihood differences effect the sort order, so on subsequent calls the most likely hypothesis changes.

Is this expected? thanks

run 1: RESPONSE:{u'status': 0, u'segment-start': 0.0, u'segment-length': 4.52, u'total-length': 4.52, u'result': {u'hypotheses': [ {u'likelihood': 145.337, u'transcript': u'minus one twelve point ten minus one eleven point ten'}, {u'likelihood': 143.347, u'transcript': u'hi minus one twelve point ten minus one eleven point ten'}, {u'likelihood': 140.417, u'transcript': u'minus one twelve point ten minus one eleven point eight ten'}, {u'likelihood': 138.427, u'transcript': u'hi minus one twelve point ten minus one eleven point eight ten'}], u'final': True}, u'segment': 0, u'id': u'ceaeba27-d042-4aef-b576-781fc0eeb252'}

run 2: RESPONSE: {u'status': 0, u'segment-start': 0.0, u'segment-length': 4.52, u'total-length': 4.52, u'result': {u'hypotheses': [ {u'likelihood': 143.316, u'transcript': u'hi minus one twelve point ten minus one eleven point ten'}, {u'likelihood': 140.427, u'transcript': u'minus one twelve point ten minus one eleven point ten'}, {u'likelihood': 138.364, u'transcript': u'hi minus one twelve point ten minus one eleven point eight ten'}, {u'likelihood': 135.475, u'transcript': u'minus one twelve point ten minus one eleven point eight ten'}], u'final': True}, u'segment': 0, u'id': u'8fb2a55e-0d24-4707-8f1f-37e90c038643'}

alumae commented 8 years ago

This is expected. I think it has to do with the dithering in the MFCC extraction, which is random. In fact, I have been confused with the same problem myself, but after debugging I concluded that it's expected.