MailyDaily / MailyDailyAndroid

MailyDaily is an Android mobile app that uses AI to provide daily summaries of your mail inbox and suggest actions like replying or archiving. It offers a conversational interface and voice assistance while ensuring your data is processed locally for privacy.
GNU General Public License v3.0
2 stars 2 forks source link

Voce integration #15

Open mariankh1 opened 1 day ago

mariankh1 commented 1 day ago

A user can communicate with the conversational AI mail assistant either via text or via voice. We need to

poulami-mukherjee commented 1 day ago

Documenting Voice Integration library options in this Notion document - https://www.notion.so/Text-to-Speech-TTS-Libraries-10f0a4197a26801b8178f91ac8613812.

poulami-mukherjee commented 23 hours ago

The AI mail assistant should let users control it entirely by voice, without needing to touch their device. This is important because it makes the app more convenient, especially when multitasking or for people with physical limitations.

Option 1: Hands-free voice assistant similar to Alexa or Siri (Ideal and Recommended)

  1. Wake Word Detection: This will detect the wake word "Maily" to start listening for commands. More details on available library options can be found here

  2. Speech Recognition: After detecting the wake word, the app should recognize the user's command (e.g., "Fetch my latest email"). More details on available models and library options can be found here

  3. Command Processing: The app processes the recognized command and fetches the requested data (e.g., read unread and summarise the content) [We are already doing this using by interacting with Mistral-Nemo-Instruct-2407 via Hugging Face API] ✅

  4. Conversational Response (TTS): The app uses Text-to-Speech to respond conversationally by converting the text response returned from Hugging face API into voice audio . More details on Text to Speech libraries can be found here

poulami-mukherjee commented 23 hours ago

Other options:

mariankh1 commented 8 hours ago

This is a potentially interesting library https://github.com/gotev/android-speech