TalAter / annyang

:speech_balloon: Speech recognition for your site
https://www.talater.com/annyang/
MIT License
6.57k stars 1.05k forks source link

Handling misrecognitions and getting lisp friendly #425

Closed The-Linguist closed 3 years ago

The-Linguist commented 3 years ago

In most cases the success rate and the accuracy of speech recognition is just amazing. However there are times like when "I scream" can be recognized as "ice cream" or vice versa. Or like "real eyes" and "realize" and "real lies" etc. Another thing is that the user could have a lisp and could only say "three" when he tries to say "tree". Therefore, in order to let the user have a "perfect experience", it is essential to forgive mispronunciations and work around misrecognitions. The more tests I do, the more obligation I feel towards including possible mistakes/mismatches in the code. For example,

    var commands =    {
      'tree': functionOfWhatToDoWhenUserSaysTree,
      'three': functionOfWhatToDoWhenUserSaysTree,
      'free': functionOfWhatToDoWhenUserSaysTree
      // ... this list could get long for certain words and phrases! Especially in non-english situations.
    };
    annyang.addCommands(commands);

What I would like to be able to do is, shorten that code so that one single string would contain all such possibilities. And thus everything would fit in one single line. The code would then look something like, var commands = { 'tree OR three OR free': functionOfWhatToDoWhenUserSaysTree }; or more simply var commands = { 'tree, three, free': functionOfWhatToDoWhenUserSaysTree }; or anything instead of using commas or turning capital 'OR' into a keyword.

That would be nice and useful for everyone, wouldn't it? Sorry if that is already possible and I just didn't know. This issue may be closed if there already has been a way to do it.

TalAter commented 3 years ago

Hey @The-Linguist

The basic syntax for defining commands is relatively simple and doesn't support that. But there is a solution.

You can define alternative transcriptions (and much more) by using regular expressions to define your commands.

For example:

const commands = {
  'favorite tree': {'regexp': /^What is your favorite (tree|three|free)$/, 'callback': functionOfWhatToDoWhenUserSaysTree}
}
The-Linguist commented 3 years ago

Thank you @TalAter . Your support is awesome.

I have come up with a minimalist solution. See https://github.com/TalAter/annyang/issues/426

TalAter commented 3 years ago

By the way, the browser's speech recognition engine doesn't return only a single result for what it thinks you said. It will return a few and annyang matches the top 5 possible phrases against your commands. So some of those alternatives might be caught automatically without you having to write all the alternatives.

CleanShot 2020-08-01 at 14 20 08@2x

The-Linguist commented 3 years ago

That's great . Definitely makes our apps MUCH more usable. Nevertheless, my tests show that the programmer's intervention can still be necessary to bring things closer to perfect. That is, especially when the browser is listening to human speech that is not in English. In Japanese, for instance, there are 10 different possible Kanji characters that are all pronounced "Ki" and another 10 for just "Hi". Your kindness and “being concerned” is appreciated. Regards!