Voce integration - Githubissues

mariankh1 commented 1 month ago

A user can communicate with the conversational AI mail assistant either via text or via voice. We need to

[ ] find a suitable, open-source, free, yet effective Android library for the voice to text,
[ ] integrate the library in the app
[ ] transform the replies of the model into voice
[ ] test the voice message play (text to voice)
[ ] text the voice message recording from the user ( voice to text)
[ ] create the new UI to support this

poulami-mukherjee commented 1 month ago

Documenting Voice Integration library options in this Notion document - https://www.notion.so/Text-to-Speech-TTS-Libraries-10f0a4197a26801b8178f91ac8613812.

poulami-mukherjee commented 1 month ago

The AI mail assistant should let users control it entirely by voice, without needing to touch their device. This is important because it makes the app more convenient, especially when multitasking or for people with physical limitations.

Option 1: Hands-free voice assistant similar to Alexa or Siri (Ideal and Recommended)

Wake Word Detection: This will detect the wake word "Maily" to start listening for commands. More details on available library options can be found here
Speech Recognition: After detecting the wake word, the app should recognize the user's command (e.g., "Fetch my latest email"). More details on available models and library options can be found here
Command Processing: The app processes the recognized command and fetches the requested data (e.g., read unread and summarise the content) [We are already doing this using by interacting with Mistral-Nemo-Instruct-2407 via Hugging Face API] ✅
Conversational Response (TTS): The app uses Text-to-Speech to respond conversationally by converting the text response returned from Hugging face API into voice audio . More details on Text to Speech libraries can be found here

poulami-mukherjee commented 1 month ago

Other options:

Option 2: Listening Continuously
The app continuously listens for the wake word ("Maily"). Once it detects the wake word, it processes the user's command and responds with a summary of unread emails using Text-to-Speech (TTS).
- Why it's not ideal: It consumes more battery, raises privacy concerns, can lead to accidental activations, and strains system performance.
Option 3: Push-to-Talk
The user manually presses a button (on-screen or hardware) to activate the voice assistant and then issues commands. The app then processes the command and responds via TTS.
- Why it's not ideal: It reduces hands-free convenience, introduces friction into the user experience, and limits accessibility, especially for people with physical limitations or when multitasking.

mariankh1 commented 1 month ago

This is a potentially interesting library https://github.com/gotev/android-speech

MailyDaily / MailyDailyAndroid

Voce integration #15