Closed Nitzahon closed 4 years ago
Hi @Nitzahon thanks for raising the issue!
The cause of this problem is the fact that the library does not actually support commands that are arrays. It was kind of a fluke that this appeared to be working in the first place. When fuzzy matching is used, the library converts the command into a string. JavaScript is very lenient and allows arrays to be converted into strings. As a result, your command is actually processed as "Everything is workingNothing is workingJust the audio worksJust the video worksשלום"
. With a low fuzzy matching threshold, this will match the speech "everything is working"
.
If you break the commands down into separate objects, you'll find the Hebrew command can be matched correctly.
However, you do raise a valid point that, from an API point of view, it would be convenient to just pass in an array of commands for one callback. I can look into supporting this when I get some free time.
In the meantime, you can set your commands up like this:
const videoCommandPhrases = ['Everything is working','Nothing is working','Just the audio works','Just the video works','שלום']
const videoCommandCallback = (command, spokenPhrase ) => {
sendAns(spokenPhrase)
handleReset()
}
const videoCommands = videoCommandPhrases.map(phrase => ({
command: phrase,
callback: videoCommandCallback,
isFuzzyMatch: true,
fuzzyMatchingThreshold: 0.8
}))
const commands = [
{
command: 'clear',
callback: ({ resetTranscript }) => resetTranscript()
},
...videoCommands
]
Hope that helps!
Hi @Nitzahon thanks for raising the issue!
The cause of this problem is the fact that the library does not actually support commands that are arrays. It was kind of a fluke that this appeared to be working in the first place. When fuzzy matching is used, the library converts the command into a string. JavaScript is very lenient and allows arrays to be converted into strings. As a result, your command is actually processed as
"Everything is workingNothing is workingJust the audio worksJust the video worksשלום"
. With a low fuzzy matching threshold, this will match the speech"everything is working"
.If you break the commands down into separate objects, you'll find the Hebrew command can be matched correctly.
However, you do raise a valid point that, from an API point of view, it would be convenient to just pass in an array of commands for one callback. I can look into supporting this when I get some free time.
In the meantime, you can set your commands up like this:
const videoCommandPhrases = ['Everything is working','Nothing is working','Just the audio works','Just the video works','שלום'] const videoCommandCallback = (command, spokenPhrase ) => { sendAns(spokenPhrase) handleReset() } const videoCommands = videoCommandPhrases.map(phrase => ({ command: phrase, callback: videoCommandCallback, isFuzzyMatch: true, fuzzyMatchingThreshold: 0.8 })) const commands = [ { command: 'clear', callback: ({ resetTranscript }) => resetTranscript() }, ...videoCommands ]
Hope that helps!
You read my mind. I feel violated :P I haven't been on my PC all week, but this exact code has been running through my brain since Thursday. And array support would be nice, I have indeed noticed all of your points regarding how the command is processed.
one issue I'm encountering, while I can clear the transcript with a voice command using the code above. I am not able to make it work with a button press like in this code:
import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition'
const Dictaphone = () => {
const { transcript, resetTranscript } = useSpeechRecognition()
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
return null
}
return (
<div>
<button onClick={SpeechRecognition.startListening}>Start</button>
<button onClick={SpeechRecognition.stopListening}>Stop</button>
<button onClick={resetTranscript}>Reset</button>
<p>{transcript}</p>
</div>
)
}
export default Dictaphone
I can either reset the transcript using voce commands, or have it activated by a button, but not both. ideally, I want a clear function that I can call from any part of the code, so when the user clicks reset, or he changes the language
What I mean is I can either use this:
const { transcript, resetTranscript } = useSpeechRecognition()
Or this:
const { transcript } = useSpeechRecognition({ commands })
And I'm not sure how to merge the two
The handleReset
in your example above differs from the one in your biomarkerz
repo. Let's examine each one:
const handleReset = useCallback(() => {
SpeechRecognition.stopListening()
SpeechRecognition.startListening({
continuous:true,
language: 'he'
})
},[]);
This does not call resetTranscript
so the transcript will remain the same. If your intent here was to stop the microphone and restart it with Hebrew detection, you would just need to call:
SpeechRecognition.startListening({
continuous:true,
language: 'he'
})
No need for the stopListening
call. Though if you did need to stop the microphone, note that this function is asynchronous. This means you need to wait for it to complete before doing something else. This is why your original callback appeared to only stop the microphone - there was a race condition between the stopListening
and startListening
calls. The stop would finish after the start and leave the microphone turned off. To avoid that, you would need to make your handler async:
const handleReset = useCallback(async () => {
await SpeechRecognition.stopListening()
SpeechRecognition.startListening({
continuous:true,
language: 'he'
})
},[]);
In your biomarkerz
repo, you use a different handleReset
:
const handleReset = useCallback(() => {
resetTranscript();
dataTrans(transcript)
}, [transcript, dataTrans, resetTranscript]);
This one does call resetTranscript
, so will clear the transcript when the button is clicked. Removing the dataTrans
, I tried this locally and it worked fine.
I don't believe the useCallback
is necessary here. handleReset
is a relatively inexpensive function to create and is already being recreated frequently anyway due to transcript
being a dependency. I suggest just making it a plain function without useCallback
. This is a nice article on useCallback
.
Note that the resetTranscript
used in the clear command callback comes from the callback args, not useSpeechRecognition
- you don't need to have called the hook before defining that command. See here: "The last argument that [the command callback] function receives will always be an object containing the following properties: resetTranscript, a function that sets the transcript to an empty string". To get reset on both the button click handler and the voice command, an example like this works:
import React from 'react'
import SpeechRecognition, { useSpeechRecognition } from '../SpeechRecognition'
const Dictaphone = () => {
const commands = [
{
command: 'clear',
callback: ({ resetTranscript }) => resetTranscript()
}
]
const { transcript, resetTranscript } = useSpeechRecognition({ commands })
const handleReset = () => {
resetTranscript()
alert(`Call dataTrans with ${transcript}`)
}
const startListening = () => {
SpeechRecognition.startListening({
continuous: true,
language: 'en-GB'
})
}
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
return null
}
return (
<div style={{ display: 'flex', flexDirection: 'column' }}>
<button onClick={handleReset}>Reset</button>
<button onClick={startListening}>Start listening</button>
<span>{transcript}</span>
</div>
)
}
export default Dictaphone
Yesterday this worked like a charm. Today the voice commands are taking an extra long time. The transcript displays properly but the voice commands take at least a minute to fire off the callback, is there a way to check the api response time, or maybe it's something in my program that's causing it to wait?
You might just need to refresh the page - I've found the Web Speech API's performance to be inconsistent if the page is left open for a long time. I'm not sure how to profile the Web Speech API - there's no visible network requests in the Network tab. You might also want to try matchInterim: true
on any non-fuzzy commands to speed up the response.
Okay, so I found out what change caused the slow down. it was removing this code
else{
SpeechRecognition.startListening({
continuous:true,
language: 'he'
});
}
from:
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
return null
}
without that conditional startListening({});
it takes several seconds for voice commands to fire
as opposed to almost instantaneous callback
of course listing it as continuous and setting the language is redundant at that stage thanks to the useEffect
edit: correction, language and continuous is needed, because the else clause fires off before useEffect, causing the first speech recognition to be default values (english)
edit edit: scratch that. it appears the problem lies with it being a continuous listen. the problem is, that without the else clause, I have no way of restarting the listen if its not continuous, since without continuous it stops the listen after it detectes something, and with the listen it waits for around 30-60 seconds before activating commands on the transcript
Ah, I didn't notice you were calling startListening
on every render - this should be avoided. During speech, the component will be re-rendering frequently. As a result, it will be hammering the startListening
method and possibly overwhelming the Web Speech API. You only need to call startListening
once for each time you want to collect speech.
To start listening on "mount", you need to replace your else
with a call touseEffect
before the browserSupportsSpeechRecognition
check:
useEffect(() => {
SpeechRecognition.startListening({ continuous: true, language: 'he' })
}, []);
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
return null
}
Subsequent stops and starts (if these are actually needed) can be executed by event handlers on button clicks or voice commands.
If this still results in slow commands, perhaps you can share your latest code so I can diagnose the issue.
I'll probably change my code then, to manually start the listen on a function call. but I will also need to recognize when listening has stopped.
I found a solution that appears to work! I connected my webcam stream to hark and had it set a redux boolean on the speaking and stopped_speaking events. then I added this useEffect:
useEffect(() => {
if(isRecog && !listening){
SpeechRecognition.startListening({
language: language
});
}
else if(!isRecog && listening){
SpeechRecognition.stopListening();
}
}, [isRecog,listening,language])
isRecog is a selector from redux signaling a request to record. All I need now is to make sure it doesn't interfere with the media recorder I have set up, but I think this is a great solution for anyone looking to activate speech recognition when the user speaks.
So the speech recognition does support for instance, hebrew, however when trying to write a command in hebrew, the callback isn't summoned. according to the transcript, it was spoken perfectly, but it did not activate the command callback that was set up. Also, resetTranscript does not work when called by handleReset (or perhaphs it's just not called), only with the spoken 'clear' command
for reference, the sendAns command sends the spoken phrase back to the parent