Joll59 / d-ser-t

d-ser-t quantifies speech recognition accuracy of the MSFT speech service and/or user created MSFT custom speech service models.
2 stars 2 forks source link

2 incomplete results from a single .wav file #4

Closed KatieProchilo closed 5 years ago

KatieProchilo commented 5 years ago

Any single .wav file should produce exactly one result.

KatieProchilo commented 5 years ago

NOTE: Whoever works on this should ask me for AnnouncersTest.wav and its transcription for easy repro.

KatieProchilo commented 5 years ago

Not only are there multiple partial responses for a single recording, but the partial responses are inconsistent. For example, without changes to the code, here are partial transcriptions from two separate runs:

ovishesh commented 5 years ago

So, did some digging into the SDK. We should be looking into the result from CRIS. We should only parse the response if result.reason == 0 and if (e.result.json).NBest[0] is not undefined.

Add this code to recognizer.recognized in TranscriptionService.


console.log(e.result.reason);
console.log(JSON.parse(e.result.json).NBest[0]);
Joll59 commented 5 years ago

fixed by #84