JamesBrill / react-speech-recognition

💬Speech recognition for your React app
https://webspeechrecognition.com/
MIT License
657 stars 119 forks source link

StringSimilarity - Added string similarity check on transcript agains… #46

Closed urbanL1fe closed 4 years ago

urbanL1fe commented 4 years ago

…t a given array of strings Added functionality to the useSpeechRecognition hook, so it can receive an array of strings that the end user will potentially say, a string similarity comparison function that returns a ratio and a number(ratio). For every speech input the hook returns the similarity ratio between the transcript and each expected string from the array if is is bigger than the number(ratio) we have defined. The defaults are: empty array, Dice's similarity coefficient and zero

JamesBrill commented 4 years ago

Hi Nikos! Thank you for opening this PR - it introduces an interesting concept. The code looks solid and I'm grateful for you including unit tests. 👌

What I would like to know more about is the "why" behind this change - i.e. the use case you foresee this being used for. I see this reflects the specific use case in your geoType project, but wonder how others may use it. My interpretation of this change is to enable "fuzzy matching" on commands. This would make sense when the commands are potentially easy to mispronounce or be misinterpreted by the Speech Recognition engine (e.g. capital names like in your project).

I have two alternative proposals for how we can move forward with this:

Limit it to an example in the docs

For this library, I try to keep the API simple and applicable to the majority of use cases. I believe there are more niche use cases that can be built on top of this library, and fuzzy matching might be one of them. Something on my todo list is to add a "Recipes" section to the documentation for this library, giving some examples of how to use it to implement interesting patterns. The main one would be a "push to talk" button, which is what most consumers use this for, but "fuzzy matching" could be another useful recipe. All the code changes in this PR can be replicated by a consumer via a simple useEffect - e.g. something like:

  const { transcript } = useSpeechRecognition()
  const stringsToListenFor = ['Hong Kong', 'Barcelona']
  const stringMatchThreshold = 0.8
  useEffect(() => {
    stringsToListenFor.forEach((stringToListenFor) => {
      // Using the string-similarity library
      const similarityRatio = stringSimilarity.compareTwoStrings(transcript, stringToListenFor)
      if (similarityRatio > stringMatchThreshold) {
        doSomethingWith(stringToListenFor)
      }
    })
  }, [transcript])

So what you could do is raise a GitHub issue outlining a good example of the fuzzy matching use case and I can turn that into some nice docs linked from the README. You are also welcome to contribute such a Recipes section yourself.

Implement it within the commands API

Alternatively, we could proceed with integrating fuzzy matching into this library. I think the API as it stands in this PR could be simplified a bit for general consumption. For example, the output stringsSimilarToTranscript is tricky to consume as you have to iterate over each element to decide how you want to react to it. I also think details like stringSimilarityRatioFunc, while might be useful for the few consumers who want to use different string matching algorithms, do overcomplicate the API. There are also two different ways of specifying strings to match now: commands and stringsYouExpectToListen. This adds more cognitive overhead for consumers trying to understand how to use this library.

What may be a cleaner interface would be to extend the existing commands property and make the fuzzy matching reactive (i.e. specify a callback to fire when the fuzzy match is satisfied) and with the option of customising it per command. The following properties could be read from each command object:

Note that for commands where fuzzy matching is turned on, you will need to ignore splats, parentheses and regex. Fuzzy matching would presumably only work on string literals.

The consumer could then specify their "strings to listen for" like any other command:

  const commands = [
    {
      command: 'London',
      callback: onCapitalSpoken,
      isFuzzyMatch: true,
      fuzzyMatchingThreshold: 0.8
    },
    {
      command: 'Paris',
      callback: onCapitalSpoken,
      isFuzzyMatch: true,
      fuzzyMatchingThreshold: 0.9
    },
    // ... other commands
  ]

Or, if the callback and threshold are to be the same for all "strings to listen for":

  const stringsToListenFor = ['London', 'Paris', 'Berlin', 'Beijing']
  const commands = stringsToListenFor.map(capital => ({
    command: capital,
    callback: onCapitalSpoken,
    isFuzzyMatch: true,
    fuzzyMatchingThreshold: 0.8
  }))

I think fuzzy matching would be a powerful addition to the commands API, so I'm in favour of the second alternative.

Hope these suggestions help and thanks again for the contribution. 👍

urbanL1fe commented 4 years ago

Thank you for your immediate and really informative reply.

I thought that it would be useful for the user to be able to wait for some results that are otherwise hard to pronounce or are usually misinterpreted by the Speech Recognition engine as you said. These may include places like in geoType, menu items in a restaurant, sport team names etc.

I really liked the "Implement it within the commands API" approach and i have a working solution locally. I will do some more testing and i will commit it for discussion.

Would it be better to open a new pull request for that?

JamesBrill commented 4 years ago

I have a working solution locally

Great!

Would it be better to open a new pull request for that?

Whatever suits you. Perhaps make a fresh PR and reference this one.

JamesBrill commented 4 years ago

Done in #46