enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
https://big-agi.com
MIT License
5.21k stars 1.18k forks source link

[Feedback] Voice Calls (alpha) #175

Open enricoros opened 10 months ago

enricoros commented 10 months ago

Instructions and feedback thread for Voice Calls in big-AGI.

1. Start a Voice call

Note: it's best to start a call on an existing chat, so that both ends (the AI Persona you call, and yourself) have the most available context

There are two ways of initiating a Voice Call from an existing chat:

  1. "Call" button at the bottom (right on desktop, left on mobile)
  2. "Call" button on the persona selector

image image

2. System Check

Make sure all the checks are green, or try to resolve the issues before proceeding. This wizard will only be shown the first time, unless the issues persist. image

3. Call Options

During a call, you can switch "Push To Talk" on/off. If active (default) then the microphone needs to be pushed before speaking. This is best to avoid echoes and other ambient noise. image

Note - you can also say the following commands during a call. These single words will be interpreted as system commands:


Known limitations:


πŸ™Œ Looking forward to your feedback to prioritize the right integration and development! πŸ™Œ

DeFiFoFum commented 10 months ago
  1. does it work, or what are the issues? 1.1 🟒 Voice-to-text and text-to-voice seem to work really well. I tried a few voices and I think they sound great. 1.2 πŸ”΄ If the AI has a long speech response, it doesn't seem that there is a way to interrupt it. 1.2.1 I asked a new question during a previous response and it kept going 1.2.2 The AI was still speaking and I clicked the back arrow to go back to the chat and it was still speaking with the call window closed

  2. how to make it better - what would you improve? 2.1 Being able to interrupt (1.2) 2.2 Making it more hands free. The nice part about a call is that you can be hands free. 2.2.1 Maybe once the AI stops speaking it starts listening again. Or it is always listening during the speech, but it only responds if you say, "excuse me" or something. 2.3 Be able to see the call conversation in the chat window.

  3. is it useful at all? - how would you add some WOW-Factor 3.1 It's a good start to conversational AI imho, but I will need to be able to do more with the call AI to make it more useful for me. Things like: 3.1.1 "Please look up the news for XYZ and tell me about it" 3.1.2 "Please make an outline of our conversation and add it to the text chat window" 3.1.3 Hands free would be so great.

vaibhavard commented 10 months ago

Suggestions:

Suno Bark:an opensource alternative to elevenlabs api for speech synthesis.

Info about the free and open source speech synthesis model Bark: Bark is a universal text-to-audio model created by Suno, with code publicly available here. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. This demo should be used for research purposes only. Commercial use is strictly prohibited. The model output is not censored and the authors do not endorse the opinions in the generated content. Use at your own risk.

jontybrook commented 10 months ago

does it work, or what are the issues?

how to make it better - what would you improve?

is it useful at all? - how would you add some WOW-Factor

squeeze your brain for more ideas

That's my feedback; hope it's useful. Keep up the good work with big-agi. I use it every day!

agus4402 commented 7 months ago

Any voice feature is not working at brave

enricoros commented 7 months ago

Any voice feature is not working at brave

Yes, sadly Brave does not support the Web Speech API for voice input.

gitwittidbit commented 5 months ago

Having issues in Firefox (on Mac). While I activated speech recognition in the browser settings, it does not seem to work: I talk but I get no reaction from the AI.

githubbozo77 commented 4 months ago

I think the voice is a great feature - really been looking for something like this - but it would be best it it really worked like a phone call. right now the thing keeps chiming when "listening" and it's kinda annoying and discruptive to the conversation - especially if you try and put it in the hands free mode (as opposed to the push to talk) - I've seen this done in other chat via browser where it's more a stream listening to the microphone. In order to get rid of the sound looping where the AI hears it's self speaking and responds to itself - i've seen it implemented where when the computer is speaking it shuts off the microphone until the sound has stopped playing - (in the case i'm talking about voxta.ai - the microphone icon goes red with a slash showing that it's not listening when the ai is speaking) - this stops the sound looping so it even works without headphones. the implementation they have on voxta.ai work smoothly - you can go back and forth like using pi ai or the chatgpt conversation mode - it's really cool. when it's speaking if you are wearing headphones they even have a mode on the settings where you can interrupt it (so it's set to listen all the time even when it's speaking - but the interrupt feature works because if you have this mode on which is mean to be used with headphones, you can even interrupt the ai while it's speak ) If you could get the conversation mode to work more like either of those this would be the killer app - you get to pick the LLM you want, you get to customize things, and you can have 2 way seemless conversation back and forth with just about any LLM that there is especially with all the choices on something like openrouter.ai - it would be very very cool to be able to have smooth conversations with just about any LLM out there - using your software and smooth conversational ai - it'd really get to be like the movie Her. Great job on this software! One other thing - as it's implemented now - when in a "call" it didn't consistently play the speech responses - it was like hit or miss - sometime it would speak what the ai was saying back and other times it wouldn't. it always displayed the response - but every other time it didn't speak the response...

dagelf commented 2 months ago

This is very nifty, and almost anyone can set it up (as long as they use Google Chome on Desktop)

But... any niceness gets erased when you have a great or funny conversation that's almost impossible to repeat, that you want to screenshot or record... and then you resize the window only to run into this:

Changing the voice will also restart the chat

Yes, resizing the window too. Really?!

At the end of a call, it is not summarized or appended to the chat history (just yet)

Come on!!! What the hell? ...... Sigh. Okay now you've forced me to take a look for myself to see why this simple functionality is so hard that it's not here yet.... :disappointed: