2 incomplete results from a single .wav file

Joll59 / d-ser-t

d-ser-t quantifies speech recognition accuracy of the MSFT speech service and/or user created MSFT custom speech service models.

2 stars 2 forks source link

2 incomplete results from a single .wav file #4

Closed KatieProchilo closed 5 years ago

KatieProchilo commented 5 years ago

Transcribed a single .wav file whose expected transcription was stored in transcription.txt.
In test_results.json there were 2 results:
- Actual Result 1:
  - ~60% of the expected transcription.
  - Succeeded by ~0.5 seconds of silence.
- Actual Result 2:
  - Picked up exactly where AR1 left off.
  - ~5% of the expected transcription.
  - Succeeded by no silence.
Even with the 2 results, the remaining ~35% of the .wav file was not transcribed.

Any single .wav file should produce exactly one result.

KatieProchilo commented 5 years ago

NOTE: Whoever works on this should ask me for AnnouncersTest.wav and its transcription for easy repro.

KatieProchilo commented 5 years ago

Not only are there multiple partial responses for a single recording, but the partial responses are inconsistent. For example, without changes to the code, here are partial transcriptions from two separate runs:

miracle spherical diabolical denizens of the deep who held stall
miracle spherical diabolical denizens of the deep who held stall around the corner of the quote

ovishesh commented 5 years ago

So, did some digging into the SDK. We should be looking into the result from CRIS. We should only parse the response if result.reason == 0 and if (e.result.json).NBest[0] is not undefined.

Add this code to recognizer.recognized in TranscriptionService.


console.log(e.result.reason);
console.log(JSON.parse(e.result.json).NBest[0]);

Joll59 commented 5 years ago

fixed by #84