Handling Concurrent Streaming Requests in My Project

erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.

GNU Affero General Public License v3.0

864 stars 98 forks source link

Hi @Deepanroyal

With the XTTS/Coqui TTS engine, 1x loaded in engine can only handle 1x stream at a time, Multiple streams pushed at it result in the CUDA tensors getting mixed up and just audio mess. The only solution for multiple simultaneous streams (if using XTTS streaming) is to have multiple TTS engines loaded in at 2GB VRAM each+500MB system RAM and then a queue management system to multiplex requests between them, or put a request on hold if no engines are available. I have not written a system that does this, though I do discuss the possibility here https://github.com/erew123/alltalk_tts/issues/63 so that's kind of where its sitting at for now.

Currently there are no other TTS engines Ive added that support streaming. Engines such as Piper do have a very good real time factor (RTF) for generation, though it depends how much text you want to generate to speech as to how effective they would be.

Thanks

erew123 / alltalk_tts

Handling Concurrent Streaming Requests in My Project #290