Joll59 / d-ser-t

d-ser-t quantifies speech recognition accuracy of the MSFT speech service and/or user created MSFT custom speech service models.
2 stars 2 forks source link

Special characters that are currently unaccounted for will be logged #28

Closed KatieProchilo closed 5 years ago

KatieProchilo commented 5 years ago

Rebased and ready for review.

Created a new class, TranscriptionAnalysisService, that validates, cleans, and analyzes transcriptions. This functionality already existed and was only moved, but pushUnhandledOutput was created.

pushUnhandledOutput looks at actual transcriptions received from the STT service. If any special characters are included that are not currently handled by the harness, the character, the word it came from, and the actual transcription it came from are logged in unhandledSTTOutput.json.

This output can later be viewed by a human, who will be able to deduce any patterns, and add them to an existing regex so that we can actually handle the character moving forward.