Fermain / -mollify

9 stars 9 forks source link

TTS: Molly may speak using Speech Synthesis #138

Closed Fermain closed 10 months ago

Fermain commented 11 months ago

We have TTS code and we have AI chat code, make the AI chatbot speak using TTS module.

StianSto commented 10 months ago

spent a little time with testing and researching this.

it takes anywhere between 9 (simplest query) and 18 seconds from sending a message to molly, to a fully writen response. most of it is writing in from stream, around 85%. the first response comes after about 1s. I have looked into the possibility of sending text as a stream, and also getting the output as a stream

Obviously, there will be a delay between molly writing and talking. also it seems like ElevenLabs prefer chunks, or sentences to create audio, so the audio will probably be a whole sentence or two behind at least.

if the idea is not to have a live talking bot, but like an "play message" button, then this becomes a whole lot simpler of course :)

Fermain commented 10 months ago

Thanks for your investigation at this time, the effort required is not worth the payoff that we will get from this feature. We will revisit in a short time once the rapidly changing world of AI has rapidly changed some more.