Stypox / dicio-android

Dicio assistant app for Android
GNU General Public License v3.0
652 stars 64 forks source link

Possibility to choose the STT #131

Open MXC48-zz opened 1 year ago

MXC48-zz commented 1 year ago

The STT Vosk model works well but it is surely not optimal for all languages, it would be nice to have the choice between several STTs in the application such as Coqui-ai etc. It would be an idea to put in place of course but not a priority 👍

Stypox commented 1 year ago

The structure to make this work is already in place. A while ago there was an Azure STT integration (branch azure is still available), but then I dropped it because my trial key expired, and wanted to focus on an on-device STT anyway. If anybody wants to build an integration with another STT service, feel free to do so!

RokeJulianLockhart commented 1 year ago

https://github.com/Stypox/dicio-android/issues/131#issuecomment-1368421869

@Stypox, does it not use the installed Google TTS provider? It appears to not, since I need to download the voice model at initial launch. Please add that as an option, too, so I don't need to waste space having both downloaded.

Being able to use the system component is always necessary.

navid-zamani commented 1 year ago

May I add that it should be possible to choose a bigger/arbitrary vosk model?

Because the small model is not really usable for STT in instant messagig contexts and such. At least the German one, and especially with me having literally been a member of a club of rarely used words. :)

HyperCriSiS commented 11 months ago

I just tried to exchange it by manually copying another vosk model but unfortunately it did not work. Even made my system crash :-D

HyperCriSiS commented 8 months ago

I just tried Sayboard, which also uses vosk. It is possible there to use a bigger model from which I deleted rnnlm and rescore folder. The model still works and is by far superior to the small one.

paolo-caroni commented 4 months ago

I'd also like to point out a "new" engine, sherpa onnx. I use it as a system TTS (it's much better than espeak and also exists in my native language) but have also STT/ASR ability (at the moment on different apk and without Speech recognition API implementation). Is developed by Xiaomi (that cooperate at kaldi project) and the opensource community, it's FOSS and offline. @MXC48-zz It can be used with Conqui models if you are interested.

navid-zamani commented 4 months ago

@paolo-caroni: Looks like the STT only has models for Chinese and English… The TTS apk was really good though! Thanks for the link!

paolo-caroni commented 4 months ago

@paolo-caroni: Looks like the STT only has models for Chinese and English… The TTS apk was really good though! Thanks for the link!

@navid-zamani I don't want to be too much out of topic with this response, it is in an early stage on android but with little change on the code you can use different models also for STT. Also you can train your model if you want.

I have pointed here this project because there is a lot of activity around STT and TTS free/libre offline engine. Vosk, TensorFlow (TensorflowTTS and Tensorspeech), next-gen Kaldi (sherpa onnx and others), Conqui-ai, Mozzilla DeepSpeech, ecc. There is also much datasets free/libre, I "donate" my voice in Italian language to Common Voice for example. So I think that will be a good idea add as input also the standard voice input API of android, since I hope in the future will be the user to choose directly on android option what offline FOSS engine use for all apps that need it (without downloading and store GiB of different models for different language for different engine). That is indeed the title of this issue, choose the STT. Furthermore, in the future "dicio" could outsource STT, to concentrate the developers energies only on the actions of the voice assistant itself but this choice is up to @Stypox alone.

paolo-caroni commented 4 months ago

@RokeJulianLockhart

does it not use the installed Google TTS provider?

I have understood what you meant, but please don't confuse TTS and STT, they are the opposite. Dicio can use the system preferred TTS engine (if he talk to you, is with the system TTS output), but actually cannot use the system preferred STT engine. Also Google's "Speech Recognition & Synthesis" (that is TTS and STT in one apk) is NOT installed on all device, but all device have the ability to install one STT engine.