PromtEngineer / Verbi

A modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech models. Supports OpenAI, Groq, Elevanlabs, CartesiaAI, and Deepgram APIs, plus local models via Ollama. Ideal for research and development in voice technology.
MIT License
678 stars 120 forks source link

Implement Dynamic Voice Response with Interruption Handling for LLM Outputs #14

Open leonardonhesi opened 3 weeks ago

leonardonhesi commented 3 weeks ago

We propose implementing a feature that allows the LLM model to stream its response in smaller chunks (or using a similar strategy), enabling voice playback to begin as soon as the user starts speaking. If the user interrupts the response, playback will pause, and the response flow will be dynamically adjusted.

This enhancement aims to optimize both cost and processing time by avoiding the need to process or pay for the entire response when an interruption occurs.

Key Objectives:

This feature would improve user experience and efficiency, especially in scenarios where immediate and responsive interactions are crucial.

leonardonhesi commented 3 weeks ago

Additionally, we aim to transform our AI assistant into an active participant in meetings. During meetings, the assistant should be able to respond to queries (using tools that retrieve information from systems, the web, and BI), recall past points discussed, and provide timely comments throughout the meeting. The assistant will wait for a command with its name to begin interacting and should assist in maintaining a summarized action plan and topics already discussed. The system should function effectively in voice-based meeting rooms.