Stypox / dicio-android

Dicio assistant app for Android
GNU General Public License v3.0
685 stars 68 forks source link

Wake up word / wakeword recognition #48

Open Stypox opened 2 years ago

Stypox commented 2 years ago

All assistants have a wake up word (e.g. "Hey Google"), so Dicio should have it, too. This should be doable with a service running in the background with Vosk that keeps listening. Athena already has this feature (video), we might take inspiration from it. The wake word recognizer should obviously be easy to enable/disable in settings, and should probably be implemented with a foreground service, so that newer Android versions do not force close it after a while.

WuerfelDev commented 2 years ago

It seems like this is a privilege of system apps in Android 12 :(

The impact of these changes is as follows:

  • Nonsystem apps using the AlwaysOnHotwordDetector class fail to compile against the Android 12 API because the API was removed from the public surface.
  • Existing system apps using the AlwaysOnHotwordDetector class might be denied from using sound trigger features at runtime. To address this issue and allow these apps to access the microphone through sound trigger, declare the RECORD_AUDIO and CAPTURE_AUDIO_HOTWORD permissions for these apps.

https://source.android.com/setup/start/android-12-release#alwaysonhotworddetector

Stypox commented 2 years ago

I won't use Android's wakeword recognition, as I think it would require Google Play Services or something like that, so I will be using just a normal background service

WuerfelDev commented 2 years ago

I'm curious how it'll work out. Not yet convinced since that means constant active audio processing which could affect system performance and battery. Also what happens when another app wants to access the microphone? (not sure if it can be used by multiple apps simultaneously) I'd love to proven wrong :)

Stypox commented 2 years ago

Also what happens when another app wants to access the microphone?

You are correct that only one app at the same time can access the microphone. Let's say app 1 (e.g. Dicio) is using the microphone. Then app 2 (e.g. a messagging app, where a user wants to record an audio) wants to use the microphone, too. So Android removes the control over the microphone from app 1 and gives it to app 2 (so that e.g. the user can record the audio). When app 2 has finished using the microphone (e.g. the user finished recording the audio), control is given back to app 1. During the time span when app 1 has no control over the microphone, it just receives completely silent input when it tries to read audio, so nothing bad happens. When it resumes getting audio, it does so as you would expect. I tested this with Dicio as app 1 and Telegram as app 2, and also viceversa, and everything worked as explained above without any two of the apps having any problem.

hobbycommandline commented 1 year ago

I tried using Vosk as a hotword detector but didn't have any luck; music playing from the phone makes Vosk unable to recognize any words. It seems Athena uses CMU PocketSphinx so that's probably the wake word detector to use.

hobbycommandline commented 1 year ago

I've decided that wake word activation would probably be best done as a separate app; some users don't want wake words, others want them on all the time. Me, I just want one for cooking that can wake up while music is playing.

My thought is, when you want wake word mode activated, you:

  1. send a startActivity() for wake-word service
  2. Digital assistant kills own instance, or waits in background mic off
  3. the user can pick what service to use (and set a default)
  4. the wake word app receives an intent on startup that it then activates when the wake word is detected.
  5. wake word is detected; launches the provided Digital Assistant intent (probably a VOICE_ACTION but it could be anything)
  6. wake word kills own instance in background
  7. Assistant pauses music, handles the voice interaction, resumes music
  8. Assistant can decide to start wake-word service again or not on interaction end

My only concerns is wake word should be kill-able by both touch interaction or spoken interaction (a wake-kill word or something)

I'm going to try and make an app like this, I have my own foss digital assistant app but I figured such a project would benefit us both and allow for user choice if the wake-service API is well defined.

I'll let you know when its ready in case you want to use it for your app

hobbycommandline commented 1 year ago

Oh darn turns out PocketSphinx isn't so great for hot word detection either while the phone is playing music. https://github.com/hobbycommandline/wake-word-pocketsphinx is what I threw together really quickly. You can try it yourself if you want, it's a minimal implementation of the proposed API and it will launch your AI if you have VOICE_ACTION set up properly, but it just has a lot of trouble hearing over music.

I was able to get it to hear over music as an emulator, but once on my phone and the music and recording were happening on the same device it refused to cooperate. I don't know of any other easy to test FOSS hotword systems, Porcupine requires a license.

hobbycommandline commented 1 year ago

Ah maybe https://github.com/mozilla/androidspeech/network/members DeepSpeech by mozilla. I'll have to give that a try another day

looks like snowboy might be the best bet for now https://github.com/Kitt-AI/snowboy/releases according to https://rhasspy.readthedocs.io/en/latest/wake-word/ ; It's a defunct project but all we really need is one good foss model. Or a clap detector but those are annoying.

Stypox commented 1 year ago

Thank you for letting me know! I don't know if it is the best idea to have two separate apps for assistance and wake-word recognition. The average user would need to manually install and configure two apps, while I would like Dicio to be ready right after being installed. Users who do not want wake words can just not enable the service (the model would not be downloaded in that case, to save space).

Would it be possible for you to bundle the app you are talking about as a library instead? That would allow both creating a separate app to suit your needs, and also embedding the library into Dicio for easy setup.

By the way, I think there is no need to disable wake-word recognition when the assistant starts listening: Android already takes care of only sending the audio stream to one app at a time. So the wake-word app can just be continuously listening in the background, and sending intents to the assistant whenever a wake word is recognized, without needing a more complex API. So yeah, just having wake-word recognition as a separate app might work out of the box already.

Stypox commented 1 year ago

I tested the app you posted above and it seems to work fairly well without music. With music, though, as you said it has some problems. (btw, the app actually consistently crashes whenever it is able to recognize the wakeword, but it's not important ;-) )

hobbycommandline commented 1 year ago

Yeah I programmed it to quit or crash after launching an assistant as the assistant is no longer needed. I didn't check if an assistant was found at all, which would be good to add, as well as an argument/setting to keep it alive (and allow it to run in the background, add language support etc), but I want something that works with music first before dedicating the time to making the app complete.

hobbycommandline commented 1 year ago

And yes I will bundle it into a library when I find a better detector, and make an API to detect if the app is installed so you can use the app instead of the library if detected (or ignore the app at your discretion). You can use intents for same process communication, so the library will have a very similar API. My thoughts on why an optional App + Library is better than just an library is if anyone else wants to build their own improved wake word service, users can replace which wake service Dicio uses without have to write any code at all, just download -> new wake app detected -> user can choose to use the new one or keep the old.

Stypox commented 1 year ago

but I want something that works with music first before dedicating the time to making the app complete.

Sure, I got it that you were just playing around

so you can use the app instead of the library if detected (or ignore the app at your discretion)

Great! Yeah that would be good

if anyone else wants to build their own improved wake word service

Makes sense and is nice to have

hobbycommandline commented 1 year ago

Darn snowboy also does not work with music playing I tried this one and it didn't work unless music was off. https://github.com/Kitt-AI/snowboy/blob/master/examples/Android/README.md

nshmyrev commented 1 year ago

If you have music playing on the phone, you probably need AEC to record sound properly, no matter for Vosk or Snowboy https://developer.android.com/reference/android/media/audiofx/AcousticEchoCanceler

hobbycommandline commented 1 year ago

Ah thank you, I had seen NoiseSuppressor but not that one, I'll give that a try and see if there's anything else good in that package that might help!

primesun commented 1 year ago

It seems like this is a privilege of system apps in Android 12 :(

There's an open issue here to address that: https://issuetracker.google.com/issues/204085255

Sazu-bit commented 2 months ago

Has there been any headway on this...? (near two years later?). I'm keen to switch to Dicio but the wakeword issue is... preventing. I have rooted my device and running Android 12 (for now..., I plan to switch to something like GrapheneOS if I can).

Stypox commented 1 month ago

https://github.com/Nailik/rhasspy_mobile implements on-device wakeword, might be worth taking a look.