JamesBrill / react-speech-recognition

💬Speech recognition for your React app
https://webspeechrecognition.com/
MIT License
657 stars 119 forks source link

How to do commands with names and dynamic letter, e.g: What is Lux Q/W/E/R Cooldown #72

Closed exzib closed 3 years ago

exzib commented 3 years ago

I am trying to use this to build some type of hobby AI app where you can ask the app about "cooldowns" for the game league of legends.

Basicially the structure of the commands would be like such:

"(What is) Lux * Cooldown"

Where * can be Q, W, E or R

How could i make this command? so that the Voice recognition returns to the callback the letter the user said?

Nitzahon commented 3 years ago

I would try making the callback use the spoken phrase. Take the spoken phrase as a string to remove "(What is) Lux" and "Cooldown", trim, and then run what's left through a basic switch case. I'll note, that while speech recognition can recognize "Q", "W" and "E" it will most likely recognize "R" as "AR" or "are"

JamesBrill commented 3 years ago

Sounds like a cool project, @exzib !

@Nitzahon 's suggestion would work, though it might be tricky to manually find the letter in the transcript. I have an alternative that might be sufficient.

You can use a named variable to capture the letter in the command:

    {
      command: '(What is) Lux :letter Cooldown',
      callback: (letter) => doSomething(letter)
    }

Then the doSomething function can operate on whatever letter was spoken. As @Nitzahon said, the Web Speech API is not smart enough to know that the user is saying letters. So you may need to map words like "are" and "queue" to the letters you're interested in.

I did notice that the Web Speech API does understand context a little bit. So if you say "letter R", it correctly interprets the "R" as the letter rather than the word "are". So you could make the command '(What is) Lux letter :letter Cooldown'. This makes the voice interface a bit awkward for the user, but it might make the logic more reliable.

Also note that your users might have difficulties with proper nouns like "Lux". Web Speech API doesn't know the vocabulary of League of Legends, so might interpret that as "locks".

The good news is that you teach Web Speech API the LoL vocabulary by specifying a custom grammar. I've never done this before so it's outside my circle of knowledge, but I'd be interested to see if you could make that work. Mozilla have documented how to do this here. To get the underlying recognition object from react-speech-recognition to set the grammar, you can call getRecognition.

Alternatively, perhaps there are other names for the QWER keys that you can listen for (e.g. "ultimate"). Designing voice interfaces is hard, so you may need to get creative with the domain to help out the Speech Recognition engine.