JamesBrill / react-speech-recognition

💬Speech recognition for your React app
https://webspeechrecognition.com/
MIT License
645 stars 116 forks source link

stopListening() and abortListening() have no effect on continuous mode #132

Closed mfkrause closed 1 year ago

mfkrause commented 2 years ago

Hey,

thanks for the great module – I'm using Chrome 96 (haven't tested other browsers yet) and having some troubles with the continuous mode. I've previously used v2 of this model which worked great, and have recently upgraded to the latest v3. However, when continuous mode is activated now, both stopListening() and abortListening() (the latter being the one I actually want) don't seem to have any effect. The browser keeps listening. Just for the sake of it, I tested this:

const abortListeningOverride = async () => {
      console.log(listening);
      stopListening();
      await abortListening();
      console.log(listening);
      setTimeout(() => console.log(listening), 2000);
};

Whenever I call the abortListeningOverride(), all three console.log()s return a true.

Any idea why that is?

JamesBrill commented 2 years ago

Hi @mfkrause !

Whenever I call the abortListeningOverride(), all three console.log()s return a true.

listening is state generated from a React Hook, meaning it won't change until the component is re-rendered. So for the lifetime of that function, the value of listening will remain whatever it was when the function was called. All React Hook state is immutable during each render - it only changes on a re-render, by which time the component will have references to a different listening variable and a different abortListeningOverride callback. Even with your timeout, listening will not change within the scope of the function as the function has captured the old value of listening in a closure.

As for why abortListening is not having any effect, I'm less sure. I added your code above to the Dictaphone example and it stopped the microphone as expected during continuous mode when I ran it locally. One difference was that I imported abortListening from the global SpeechRecognition object, like this:

  const abortListeningOverride = async () => {
    console.log(listening);
    SpeechRecognition.stopListening();
    await SpeechRecognition.abortListening();
    console.log(listening);
    setTimeout(() => console.log(listening), 2000);
  };

Can I see your full component to see how you are importing abortListening?

mfkrause commented 2 years ago

Hey, @JamesBrill — thanks for the reply and the heads-up regarding the listening state variable. I've had some wrong thoughts about the state lifecycle there.

I have refactored the whole thing in the meantime and implemented some sort of "hack" to bypass this behavior. I'm now keeping track of whether or not the Web Speech API should be listening on my own in a state variable and am comparing that variable to the listening from the hook regularly. If listening is false but the API should actually be listening (based on my own state variable), I'm calling startListening() again. I am still unsure where the underlying behavior originates from, but this has worked pretty well for me so far.

One thing to note here (which I should've already in the OP) is that I'm still using class components (it's quite an old customer project that hasn't been refactored to functional components) and have therefore wrapped this library in a HOC. The HOC itself looks something like this:

import React from 'react';
import SpeechRecognition, { ListeningOptions, useSpeechRecognition } from 'react-speech-recognition';

export default function SpeechRecognitionWrapper(Component: any) { // TODO: Proper type for Component
  return function WrappedComponent(props: any) {
    const {
      transcript,
      interimTranscript,
      finalTranscript,
      listening,
      resetTranscript,
      browserSupportsSpeechRecognition,
    } = useSpeechRecognition();

    const {
      stopListening,
      abortListening,
      getRecognition,
    } = SpeechRecognition;

    const recognition = getRecognition();

    // Always continiuously listen in German
    const startListening = (options?: ListeningOptions) => {
      console.log('startListening was called.');
      let optionsOverride = options;
      if (!options) optionsOverride = {};
      if (!optionsOverride!.language) optionsOverride!.language = 'de-DE';
      if (optionsOverride!.continuous === undefined) optionsOverride!.continuous = true;
      SpeechRecognition.startListening({ ...optionsOverride });
      console.log('Started listening with options ', optionsOverride);
    };

    const abortListeningOverride = async () => {
      console.log('abortListening was called.');
      // Transcript should be freshly created everytime we start listening again, so reset it here to be safe
      resetTranscript();
      // Somehow, abortListening() doesn't always work in continuous mode.
      // This while-loop looks dangerous, but does the trick.
      // However, this takes quite a while sometimes (a second or so) in Safari. The app freezes during this time.
      while (listening) {
        // eslint-disable-next-line no-await-in-loop
        await abortListening();
      }
      console.log('Successfully aborted listening.');
    };

    console.log('RECV Transcript: ', transcript);
    if (finalTranscript.trim().length > 0) console.log('RECV Final Transcript: ', finalTranscript);

    return (
      <Component
        transcript={transcript}
        interimTranscript={interimTranscript}
        finalTranscript={finalTranscript}
        listening={listening}
        resetTranscript={resetTranscript}
        startListening={startListening}
        stopListening={stopListening}
        abortListening={abortListeningOverride}
        browserSupportsSpeechRecognition={browserSupportsSpeechRecognition}
        recognition={recognition}
        // eslint-disable-next-line react/jsx-props-no-spreading
        {...props}
      />
    );
  };
}

... which then wraps this Component, where you can also see the hack mentioned above:

import React, { useEffect, useState } from 'react';
import {
  Switch,
  Route,
} from 'react-router-dom';
import '@egjs/react-flicking/dist/flicking.css';

import AppContext from '../../utils/context';
import SpeechRecognition from '../../utils/hoc/SpeechRecognitionWrapper';
import * as Types from '../../utils/types';
import ErrorBoundary from '../../components/ErrorBoundary';
// importing views...

interface AllSwitchProps extends Types.SpeechRecognition {
  audio: HTMLAudioElement;
}
function AllSwitch(props: AllSwitchProps) {
  const {
    audio,
    transcript,
    startListening,
    stopListening,
    abortListening,
    finalTranscript,
    interimTranscript,
    resetTranscript,
    listening,
    browserSupportsSpeechRecognition,
    recognition,
  } = props;

  const [visionTest, setVisionTest] = useState('');
  const [avatar, setAvatar] = useState('');
  const [scaleFactor, setScaleFactor] = useState(0);
  const [useSpeechRecognition, setUseSpeechRecognition] = useState<boolean | null>(null);
  const [result, setResult] = useState('warning');
  const [usedNumbers, setUsedNumbers] = useState<Types.UsedNumbers>([]);
  const [recognizedNumbers, setRecognizedNumbers] = useState<(number | null)[]>([]);
  const [reachedScore, setReachedScore] = useState<Types.ReachedScore>(0);
  const [speechRecognitionShouldBeListening, setSpeechRecognitionShouldBeListening] = useState<boolean>(false);

  const overriddenStartListening = () => {
    startListening();
    setSpeechRecognitionShouldBeListening(true);
  };

  const overriddenStopListening = () => {
    stopListening();
    setSpeechRecognitionShouldBeListening(false);
  };

  const overriddenAbortListening = () => {
    abortListening();
    setSpeechRecognitionShouldBeListening(false);
  };

  const speechRecognition = {
    transcript,
    finalTranscript,
    interimTranscript,
    resetTranscript,
    listening,
    startListening: overriddenStartListening,
    stopListening: overriddenStopListening,
    abortListening: overriddenAbortListening,
    browserSupportsSpeechRecognition,
    recognition,
  };

  useEffect(() => {
    const interval = window.setInterval(() => {
      if (speechRecognitionShouldBeListening && !listening) {
        console.log('WebSpeech API is not listening even though it should. Trying to restart...');
        startListening();
      }
    }, 1000);

    return () => {
      clearInterval(interval);
    };
  }, [listening, speechRecognitionShouldBeListening]);

  return (
    <Switch>
      <ErrorBoundary>
        <AppContext.Provider
          value={{
            visionTest,
            setVisionTest,
            avatar,
            setAvatar,
            scaleFactor,
            setScaleFactor,
            useSpeechRecognition,
            setUseSpeechRecognition,
            result,
            setResult,
            usedNumbers,
            setUsedNumbers,
            recognizedNumbers,
            setRecognizedNumbers,
            reachedScore,
            setReachedScore,
            speechRecognitionShouldBeListening,
            setSpeechRecognitionShouldBeListening,
          }}
        >
          // Routes...
        </AppContext.Provider>
      </ErrorBoundary>
    </Switch>
  );
}

export default SpeechRecognition(AllSwitch);

So for now, this is fixed on my end (even though the fix really is hacky) and can be closed. I probably messed something up state-wise in the HOC implementation.

JamesBrill commented 2 years ago

In this code, you will never reach the console.log when listening is true:

      while (listening) {
        await abortListening();
      }
      console.log('Successfully aborted listening.');

To recap, listening is state emitted by a React Hook and will be unchanged for the lifetime of your abortListeningOverride callback due to the closure around its value. When its value is changed, the component will re-render and a new instance of your callback will be created with a closure around the new value. But for the original instance of your callback, listening will still have the old value. So you effectively have an infinite loop here and your function never returns. The reason it doesn't crash your app completely is that abortListening doesn't resolve when listening has already been aborted - this is a slight bug as it should just return immediately when the microphone isn't already listening (I have a GitHub issue to clean up some of abortListening's wonky behaviour). Instead, it just hangs forever, which is why your infinite loop doesn't burn out your CPU.

I'm still baffled by the need for the workaround (and the HOC). When I wrap my own components with your HOC and swap your override for plain SpeechRecognition.abortListening, it aborts continuous listening as expected when I click on my abort button. Have you tried the vanilla Dictaphone examples from the docs and testing SpeechRecognition.abortListening on a simple button there? I'd like to see how you're triggering the abort in the first place.

One thing to note here (which I should've already in the OP) is that I'm still using class components (it's quite an old customer project that hasn't been refactored to functional components) and have therefore wrapped this library in a HOC.

If the web application is already using React Hooks, and the consumer component AllSwitch is already using Hooks, why not just use the speech recognition hook directly in AllSwitch rather than making a HOC?

One thing I couldn't see in your code is where the abort callback is triggered in the first place. I'm assuming there's a button or condition that calls it - can I see the code for that?

Overall, that while loop is dangerous (potentially both memory and CPU impact) and I am sure this wrapper is unnecessary. Would love to help you reach a cleaner solution if I can.