Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.95k stars 1.86k forks source link

JS: continuous speech recognition not invoking `recognised` event handler. #922

Closed BryanDollery closed 3 years ago

BryanDollery commented 3 years ago

I'm writing yet-another speech transcription/translation app in javascript a browser. I have imported the "microsoft-cognitiveservices-speech-sdk" (v1.14.1) to achieve this and correctly configured the speech API in my Azure Resource Group along with the text translation cognitive services. So good so far. I have translation working like a dream. Transcription (speech-to-text, or STT) has been more difficult, but I'm finally getting there.

However, I have a problem with it. I am using the continuous recognition method directly from the default microphone, initialising the process with the startContinuousRecognitionAsync() method. Whilst the recognizing method is invoked during the 'session' I have found that the recognized event doesn't fire for me.

import { SpeechConfig, AudioConfig, SpeechRecognizer, ResultReason, NoMatchReason, NoMatchDetails } from "microsoft-cognitiveservices-speech-sdk";
const speech = {
  subscriptionKey: "...",
  region: "westeurope",
  speechConfig: null,
  audioConfig: null,
  recognizer: null,
  register: () => {
    console.log(`Starting...`);
    speech.speechConfig = SpeechConfig.fromSubscription(speech.subscriptionKey, speech.region);
    speech.audioConfig = AudioConfig.fromDefaultMicrophoneInput();
    speech.recognizer = new SpeechRecognizer(speech.speechConfig, speech.audioConfig);
  },
  rec: () => {
    speech.recognizer.recognized = speech.done;
    speech.recognizer.recognizing = speech.recognizing;
    speech.recognizer.startContinuousRecognitionAsync();
    console.log("Recording...");
  },
  done: (s, e) => {
    console.log(`SSS: ${JSON.stringify(s, null, 4)}`);
    console.log(`EEE: ${JSON.stringify(e, null, 4)}`);

    if (e.result.reason === ResultReason.NoMatch) {
      const noMatchDetail = NoMatchDetails.fromResult(e.result);
      console.log("DDD (recognized)  Reason: " + ResultReason[e.result.reason] + " | NoMatchReason: " + NoMatchReason[noMatchDetail.reason]);
    } else {
      console.log(`PPP (recognized)  Reason: ${ResultReason[e.result.reason]} | Duration: ${e.result.duration} | Offset: ${e.result.offset}`);
      console.log(`Text: ${e.result.text}`);
    }
  },
  recognizing: (s, e) => {
    console.log(`BBB: ${e.result.text}`);
  },
  stop: () => {
    speech.recognizer.stopContinuousRecognitionAsync(r => {
      console.log(`TTT: stopped | ${r}`);
    });
  }
}

The logs are filled with unique search strings to help me search for the code that generated them, e.g. 'TTT'.

I won't bore you with the code needed to invoke this object -- it sits in a react app with all sorts of complexities. It basically calls speech.register() once, then when the user wants to start STT it calls speech.rec(), followed sometime later by speech.stop(). I expect recognizing() to be invoked during the session and recognized() (the done() method in my object) to be called when the operation has completed because speech.recognizer.stopContinuousRecognitionAsync() has been invoked. I have also registered an error handler, that's not being invoked either but that is not important for this conversation. I have paused my code in the Chrome debugger and confirmed that my handlers are properly registered with the recognizer.

This sort-of works. I see the results of the recognizing method (and they're quite impressive -- even transcribing Jabberwocky quite well). But I really need to hook into the final event when the recording is stopped and the full text becomes available. It would be a lot of hard work to use the output of the partial in-progress transcription event (recognizing()). I expect the recognized() event handler to be invoked with the full contents of the transcription from start to finish, but it isn't being invoked at all.

Any help would be great. I have a deadline for completion of the project in a fortnight and this is the final part to get working. The translation works great and the transcription seems really promising. Thanks, Bryan.

BryanDollery commented 3 years ago

Sorry -- this wasn't a bug -- it was my fault entirely. The problem was that in trying to understand the SDK I had stringified the first parameter of the recognized() event handler, documented simply as s. It turns out that this is the recognizer itself and stringifying it causes a silent error. As this was the first line of the event-handler it was simply failing silently. I have fixed my code and I'm moving on.