futo-org / voice-input

Issue tracker for Voice Input
86 stars 0 forks source link

Provide the SpeechRecognizer API #7

Open Kaljurand opened 11 months ago

Kaljurand commented 11 months ago

Expose the recognizer via the SpeechRecognizer API, so that other apps could use it directly, not only via the IME API, e.g. I'd like to use it via Kõnele (http://kaljurand.github.io/K6nele/about/).

In general, I think the app should prioritize the service aspect (incl. providing an API that maps to the entire Whisper API), and focus less on the IME aspect, because the latter requires non-trivial UI/UX work and needs to compete with established apps like Gboard (while there are hardly any apps in the pure service space).

abb128 commented 11 months ago

I've not focused on this for now because I'm not actually aware of any user-facing apps that make use of the SpeechRecognizer API, apart from Kõnele, so I'm not sure if it would make any difference for most users. Every app I've tested calls the implicit intent or, in the case of keyboards, switches to a voice IME. It is something I'd like to get to in the future though. If there are any more apps, please let me know as it would help prioritize this.

Kaljurand commented 11 months ago

I'm not aware of any current prominent apps that would use the SpeechRecognizer API, but I haven't done much research on this either. It might be that, e.g. the SpeechRecognizer is used in non-standard situations as a fall-back, e.g. if Google's infrastructure is not detected on the device. About 10 years ago things were different. Most keyboard and assistant apps did connect to the SpeechRecognizer in order to provide support for speech input. Even the Google Translate app used SpeechRecognizer, although only for languages that it didn't support, so it was possible to do voice-to-voice translation where Kõnele performed Estonian speech-to-text (instead of Google, which did not support Estonian at the time). Things have deteriorated since then: apps commonly have a lock-in to a single provider and/or call the RecognizerIntent pop-up (resulting in a less integrated UI). It's a bit puzzling to me why this deterioration has happened, but part of the reason could be that there hasn't been a good open free multilingual speech-to-text service available on Android. This could now change with the FUTO Voice, which could lead the way so that we don't get stuck in a chicken and egg situation.

It's also reasonable to assume that for some use cases the provided UIs are not going to be satisfactory, so the host app would want to use the service and build a UI from scratch (see also https://stackoverflow.com/questions/6316937/how-can-i-use-speech-recognition-without-the-annoying-dialog-in-android-phones). E.g. the current IME is not responsive/resizable to different form factors, and doesn't provide some essential buttons (delete last word, switch language). Not to mention some more advanced features like editing or action commands (e.g. a "Send" command, in case the IME is used in a messaging app), or UI support for Whisper-specific features like translation and prompting. Covering all these UI needs seems more work than providing a service with a direct API mapping to Whisper. The API would simply extend android.speech.RecognizerIntent with some new constants (e.g. "EXTRA_RETURN_TRANSLATION", "EXTRA_BIASING_PROMPT").

Gymcap commented 11 months ago

I would also benefit from exposing the recognizer through SpeechRecognizer API, as I use Dicio as my open source assistant app to avoid google and other creepy services, and for that I'm currently using the vosk library which is significantly less accurate than this app.

AeliusSaionji commented 11 months ago

The keyboard that I use (MessageEase) has a microphone button for quick dictation input which I believe uses this API

Cris-Edmundson commented 10 months ago

@AeliusSaionji by the way, there's a currently maintained fork of message ease on f-droid now called thumb-key

pvagner commented 10 months ago

3rd party assistive tools such as voice assistants and screen reading apps used by visually disabled users are also using this API in order to allow dictation features. These all are closed source though. e.g. Suite of accessible at developed by slovak non profit Touch and Speech named Corvus or Jieshuo screen reader

Bu156 commented 10 months ago

I use Duolingo on CalyxOS. It works with a modified version of Dicio that exports this API, but as the speech recognition I have set up with Dicio only works for French (and makes more mistakes than FUTO) I would prefer to use FUTO with a wider choice of models.

tashijayla commented 6 months ago

I think exposing the Recognizer API would be a great idea because apps like Signal, Session, Matrix, and Threema would be able to add a popup next to voice messages saying, "We now support translating voice messages to text using the TUTO-voice-input API. (Please download TUTO-voice-input and configure the app to get access to this ability)." I would enjoy asking apps to integrate TUTO-voice-input into their apps instead of proprietary Google speech-to-text software. If we can get an app like Signal to integrate TUTO-voice-input into their Android client, that could lead to funding for this fantastic project and free publicity.

Lastly, thank you for such a fantastic app. I have dyslexia and finally don't need to use the Google voice recorder. Now, I finally can move away from GrapheneOS to a different ROM like E-OS. Now, I finally have a way to convert voice text on any phone offline 😃