csdcorp / speech_to_text

A Flutter plugin that exposes device specific text to speech recognition capability.
BSD 3-Clause "New" or "Revised" License
351 stars 218 forks source link

How can I make the plugin recognise single letters? #437

Closed ksegla closed 7 months ago

ksegla commented 7 months ago

Hi

I have an issue with single letters A, B etc. (tested in French and English). The same thing occurs with some single syllable words (such as "un", "deux" in French). I know that this may be a limitation of my device software (Samsung S22+) but I was wondering if there was a way to trick the plugin. To make it think that there was a word before, since saying "letter A", "letter B", always work. The issue seems to be really on single syllables. There is no issue with w ("double u") for instance.

From my Google search, I found this: https://stackoverflow.com/questions/52419808/how-to-get-google-cloud-speech-voice-to-text-to-recognize-letters-and-sounds It's about how a list of hints(words) can be supplied to help the speech to text return letters etc.

This issue is close to https://github.com/csdcorp/speech_to_text/issues/421 but that issue is closed and I'm opening this one to try to figure out workarounds. The problem seems common enough.

Thanks

sowens-csd commented 7 months ago

Unfortunately no, none of the underlying speech support provide the ability to use hint lists. Have you checked the list of alternate recognitions in the result list to see if the correct recognition is there?

ksegla commented 7 months ago

@sowens-csd The alternate list doesn't return anything. I went into the Android code and made it log the root errors. It was really the No Match error. Zero output to potentially mess with. I tried to find some way to feed hints to the recognizer. I came across this: https://developer.android.com/reference/android/speech/RecognizerIntent#EXTRA_BIASING_STRINGS but it was introduced only from API 33. I still tried it but even with it, feeding it a list of A, B, C etc. didn't seem to work (I didn't triple-check though)

Another idea of mine was to hackily inject some audio source ("letter", "number" etc.) before the actual spoken word, since saying "letter A", "letter B" or "number 2" always work while "A", "B", "2" always fail. I found this https://developer.android.com/reference/android/speech/RecognizerIntent#EXTRA_AUDIO_SOURCE but it was also from API 33, with an earlier iteration introduced in API 31. I didn't bother trying.

It is just such a pity that those basic use cases are not adequately treated by the speech to text engines :/. Nothing you can do about it. Thanks for your work on this plugin :)