Any idea on how we could update a transcript.

sb5512 commented 5 years ago

For example when I say "Hello how are you" and I would want to delete the word "you" such that I only have "Hello how are". Now if I speak again and say "I am good". my new transcript should be "Hello how are I am good".

JamesBrill commented 5 years ago

Some ideas...

Approach 1

The first approach is to pass the entire transcript prop through your own custom function that modifies it in some way (e.g. deleting "you"s). i.e.

<p>{transformTranscript(this.props.transcript)}</p>

However, this could get computationally expensive quickly, so probably only useful for short speech.

Approach 2

The second approach is to maintain your own transcript in your component state and add new parts to it from Speech Recognition piece-by-piece. So the transcript you render would be your own:

<p>{this.state.customTranscript}</p>

To add to this transcript, you could listen for when the user stops speaking. I'll explain how to listen for that shortly. On that event, you have an opportunity to modify the new transcript:

onSpeechEnd() {
  const { transcript, resetTranscript } = this.props;
  const { customTranscript } = this.state;
  const modifiedTranscript = transformTranscript(transcript);
  this.setState({ customTranscript: customTranscript + modifiedTranscript });
  resetTranscript();
}

To detect when the user stops speaking, you could try this when mounting your component:

this.props.recognition.onspeechend = onSpeechEnd;

(see https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/onspeechend)

Approach 3

A third approach, which is a variation of the second, is to detect when some new word has been "finalised" by the Speech Recognition algorithm (i.e. it's maximised its confidence in what the new word is). Then you can examine the word and decide whether to add it to your transcript state or not. Here's how you could do that using the componentWillReceiveProps lifecycle hook (note that this React feature is soon to be deprecated - I think React Hooks are the recommended way of doing this now):

componentWillReceiveProps(nextProps) {
  const { customTranscript } = this.state;
  if (this.props.interimTranscript !== '' && nextProps.interimTranscript === '') {
    const newWord = this.props.interimTranscript;
    this.setState({ customTranscript: addNewWord(customTranscript, newWord) });
  }
}

This detects when the interim transcript gets emptied, meaning that its previous value was a finalised word about to be added to the Speech Recognition transcript. This is your opportunity to filter out that word if you don't want it or add it to your own transcript state if you do.

That's all I can think of for now. I think the best solution is to maintain your own transcript state and use approach 2 or 3 to determine when to add to it.

I may be able to give better suggestions if I understand your use case better. Hope these ideas help anyway!

sb5512 commented 5 years ago

The use case is to determine "commands" out of the transcript and render text without the command. For example "Hello How are you map" should identify "map" as command and call some function to perform "mapping" task. i.e. it should render "Hello1 How2 are3 you4" because of "map" function (map does mapping each word to numbers)

sb5512 commented 5 years ago

Some ideas...

Approach 1

The first approach is to pass the entire transcript prop through your own custom function that modifies it in some way (e.g. deleting "you"s). i.e.
<p>{transformTranscript(this.props.transcript)}</p>
However, this could get computationally expensive quickly, so probably only useful for short speech.

Approach 2

The second approach is to maintain your own transcript in your component state and add new parts to it from Speech Recognition piece-by-piece. So the transcript you render would be your own:
<p>{this.state.customTranscript}</p>
To add to this transcript, you could listen for when the user stops speaking. I'll explain how to listen for that shortly. On that event, you have an opportunity to modify the new transcript:
onSpeechEnd() {
  const { transcript, resetTranscript } = this.props;
  const { customTranscript } = this.state;
  const modifiedTranscript = transformTranscript(transcript);
  this.setState({ customTranscript: customTranscript + modifiedTranscript });
  resetTranscript();
}
To detect when the user stops speaking, you could try this when mounting your component:
this.props.recognition.onspeechend = onSpeechEnd;
(see https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/onspeechend)

Approach 3

A third approach, which is a variation of the second, is to detect when some new word has been "finalised" by the Speech Recognition algorithm (i.e. it's maximised its confidence in what the new word is). Then you can examine the word and decide whether to add it to your transcript state or not. Here's how you could do that using the componentWillReceiveProps lifecycle hook (note that this React feature is soon to be deprecated - I think React Hooks are the recommended way of doing this now):
componentWillReceiveProps(nextProps) {
  const { customTranscript } = this.state;
  if (this.props.interimTranscript !== '' && nextProps.interimTranscript === '') {
    const newWord = this.props.interimTranscript;
    this.setState({ customTranscript: addNewWord(customTranscript, newWord) });
  }
}
This detects when the interim transcript gets emptied, meaning that its previous value was a finalised word about to be added to the Speech Recognition transcript. This is your opportunity to filter out that word if you don't want it or add it to your own transcript state if you do.

That's all I can think of for now. I think the best solution is to maintain your own transcript state and use approach 2 or 3 to determine when to add to it.

I may be able to give better suggestions if I understand your use case better. Hope these ideas help anyway!

Thanks for the comment. These approach gives me good idea on how I should think in tackling the problem.

JamesBrill / react-speech-recognition