JamesBrill / react-speech-recognition

💬Speech recognition for your React app
https://webspeechrecognition.com/
MIT License
657 stars 119 forks source link

Support polyfills #86

Closed JamesBrill closed 3 years ago

JamesBrill commented 3 years ago

The first support for polyfill integration! Highly experimental stuff. Allows for the Speech Recognition engine to be switched out for a polyfill. Tested with an Azure polyfill with the integration documented.

Polyfills

If you want react-speech-recognition to work on more browsers than just Chrome, you can integrate a polyfill. This is a piece of code that fills in some missing feature in browsers that don't support it.

Under the hood, Web Speech API in Chrome uses Google's speech recognition servers. To replicate this functionality elsewhere, you will need to host your own speech recognition service and implement the Web Speech API using that service. That implementation, which is essentially a polyfill, can then be plugged into react-speech-recognition. You can write that polyfill yourself, but it's recommended you use one someone else has already made.

Basic usage

The SpeechRecognition class exported by react-speech-recognition has the method applyPolyfill. This can take an implementation of the W3C SpeechRecognition specification. From then on, that implementation will used by react-speech-recognition to transcribe speech picked up by the microphone.

SpeechRecognition.applyPolyfill(SpeechRecognitionPolyfill)

Note that this type of polyfill that does not pollute the global scope is known as a "ponyfill" - the distinction is explained here. react-speech-recognition will also pick up traditional polyfills - just make sure you import them before react-speech-recognition.

Usage recommendations

Polyfill libraries

Rather than roll your own, you should use a ready-made polyfill for one of the major cloud providers' speech recognition services.

Azure Cognitive Services

This is Microsoft's offering for speech recognition (among many other features). The free trial gives you $200 of credit to get started. It's pretty easy to set up - see the documentation.

Here is a basic example combining web-speech-cognitive-services and react-speech-recognition to get you started. This code worked with version 7.1.0 of the polyfill in February 2021 - if it has become outdated due to changes in the polyfill or in Azure Cognitive Services, please raise a GitHub issue or PR to get this updated.

import React, { useEffect, useState } from 'react';
import createSpeechServicesPonyfill from 'web-speech-cognitive-services';
import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition';

const SUBSCRIPTION_KEY = '<INSERT_SUBSCRIPTION_KEY_HERE>';
const REGION = '<INSERT_REGION_HERE>';
const TOKEN_ENDPOINT = `https://${REGION}.api.cognitive.microsoft.com/sts/v1.0/issuetoken`;

const Dictaphone = () => {
  const [loadingSpeechRecognition, setLoadingSpeechRecognition] = useState(true);
  const { transcript, resetTranscript } = useSpeechRecognition();

  const startListening = () => SpeechRecognition.startListening({
    continuous: true,
    language: 'en-US'
  });

  useEffect(() => {
    const loadSpeechRecognition = async () => {
      const response = await fetch(TOKEN_ENDPOINT, {
        method: 'POST',
        headers: { 'Ocp-Apim-Subscription-Key': SUBSCRIPTION_KEY }
      });
      const authorizationToken = await response.text();
      const {
        SpeechRecognition: AzureSpeechRecognition
      } = await createSpeechServicesPonyfill({
        credentials: {
          region: REGION,
          authorizationToken,
        }
      });
      SpeechRecognition.applyPolyfill(AzureSpeechRecognition);
      setLoadingSpeechRecognition(false);
    }
    loadSpeechRecognition();
  }, []);

  if (loadingSpeechRecognition) {
    return null;
  }

  return (
    <div>
      <button onClick={startListening}>Start</button>
      <button onClick={SpeechRecognition.stopListening}>Stop</button>
      <button onClick={resetTranscript}>Reset</button>
      <p>{transcript}</p>
    </div>
  );
};
export default Dictaphone;

Caveats

AWS Transcribe

There is no polyfill for AWS Transcribe in the ecosystem yet, though a promising project can be found here.

Providing your own polyfill

If you want to roll your own implementation of the Speech Recognition API, follow the W3C SpeechRecognition specification. You should implement at least the following for react-speech-recognition to work:

Kundannetset commented 10 months ago

@JamesBrill how to get app id