Closed Deepanroyal closed 1 month ago
Hi @Deepanroyal
With the XTTS/Coqui TTS engine, 1x loaded in engine can only handle 1x stream at a time, Multiple streams pushed at it result in the CUDA tensors getting mixed up and just audio mess. The only solution for multiple simultaneous streams (if using XTTS streaming) is to have multiple TTS engines loaded in at 2GB VRAM each+500MB system RAM and then a queue management system to multiplex requests between them, or put a request on hold if no engines are available. I have not written a system that does this, though I do discuss the possibility here https://github.com/erew123/alltalk_tts/issues/63 so that's kind of where its sitting at for now.
Currently there are no other TTS engines Ive added that support streaming. Engines such as Piper do have a very good real time factor (RTF) for generation, though it depends how much text you want to generate to speech as to how effective they would be.
Thanks
I'm working on a project that requires efficient handling of multiple concurrent streaming requests. I have some specific requirements and challenges that I'd like advice on:
Scalability: I need to scale the number of concurrent streams efficiently. Resource Management: I need to manage GPU and memory usage effectively when handling multiple streams.
Any advice or recommendations on these issues would be greatly appreciated.