erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
864 stars 98 forks source link

Handling Concurrent Streaming Requests in My Project #290

Closed Deepanroyal closed 1 month ago

Deepanroyal commented 1 month ago

I'm working on a project that requires efficient handling of multiple concurrent streaming requests. I have some specific requirements and challenges that I'd like advice on:

Scalability: I need to scale the number of concurrent streams efficiently. Resource Management: I need to manage GPU and memory usage effectively when handling multiple streams.

Any advice or recommendations on these issues would be greatly appreciated.

erew123 commented 1 month ago

Hi @Deepanroyal

With the XTTS/Coqui TTS engine, 1x loaded in engine can only handle 1x stream at a time, Multiple streams pushed at it result in the CUDA tensors getting mixed up and just audio mess. The only solution for multiple simultaneous streams (if using XTTS streaming) is to have multiple TTS engines loaded in at 2GB VRAM each+500MB system RAM and then a queue management system to multiplex requests between them, or put a request on hold if no engines are available. I have not written a system that does this, though I do discuss the possibility here https://github.com/erew123/alltalk_tts/issues/63 so that's kind of where its sitting at for now.

Currently there are no other TTS engines Ive added that support streaming. Engines such as Piper do have a very good real time factor (RTF) for generation, though it depends how much text you want to generate to speech as to how effective they would be.

Thanks