erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
816 stars 91 forks source link

Multiple Generations at once. #194

Closed RenNagasaki closed 4 months ago

RenNagasaki commented 4 months ago

As I plan to use a self hosted variation of this service, I'd like to be able to request more than one generation at once. My use case is that more than one user could request a dialog to be voiced, at the moment I can only generate one by one, but that hinders it substantially.

Describe the solution you'd like -> The ability to request multiple generations at once as long as the hardware can manage.

Describe alternatives you've considered Hosting multiple instances of the tool and queuing on my end.

erew123 commented 4 months ago

Hi @RenNagasaki

Technically this is already in the Features request list here

image

But let me ask you a question based on your scenario. When you say "multiple generations" do you mean you:

A) are happy for each generation to queue up behind one another, so if say 3x generation requests come in, they would be queued, but served in the order they came in? So basically, a queuing system.

or

B), You want it to start generating request 1 and if request 2 and request 3 come in while request 1 is still processing, you want it to immediately start work on request 2 and 3, So parallel generation.

Or would either A OR B work?

A should definitely be possible. B I've not looked into how possible this will be, there may be limitations here because of the Coqui scripts/engine that I cannot get around (or any other TTS engines I will be hopefully adding).

What I can say however, is that version 2 is very much in process being coded, though I am hoping to have a lot of new features added/coded in the next big release, (hopefully) including most of the Features request list, plus resolving lots of other little niggles/issues Ive had along the way e.g. with the API requests, you can set defaults, so if something isn't provided in a generation request, it will just use the default that you set. Im also trying to make it modular to a degree, so there will be a base of AllTalk and then you can choose what other modules (or TTS engines) to load/use, so hopefully you can keep it as lightweight as you want. Better discovery of models available for loading etc and so on and so forth.

image

Thanks

erew123 commented 4 months ago

Im going to close this for now as we discussed it elsewhere.

mercuryyy commented 3 months ago

@erew123 I think A would be great, but if you can manage to get B to work and having parallel generation this can really be amazing for what we are trying to do.

Any time frame on the big v2 release? and can you update if you where able to test parallel generation?

Thank you so much!

erew123 commented 3 months ago

@mercuryyy All v2 progress is on here https://github.com/erew123/alltalk_tts/discussions/211

I spent quite a lot of yesterday getting the requirements files narrowed down to the minimal install needed on Windows and Linux, so that I can get an installation routine working, as there are quite a lot of different requirements for this version. So that's certainly been a big concern/worry for me eased off as I got V2 running on both Windows and Linux yesterday... meaning I feel much closer/easier in saying I should have a BETA out within the next week or so.... depending on how much time I can spend on it and how easily that goes.... Take that statement with a pinch of salt though.

I think I have the core of AllTalk v2 worked out now and thats been the major hurdle as everything else e.g. finetuning etc is just add-ons/tidy up/documentation etc, though I need to make sure installation routines run well and I will have to run through those 4-5x per OS just to clean out bugs, so things like that take 5-8 hours of one day.

As for the question of the queue.... No not yet. To fully achieve that, I need to ensure the subprocess runs as multithreaded and that the queue will send back to source correctly. Also I have no idea how different TTS engines will feel about multiple simultaneous generation requests on the same hardware. That may only be possible for some engines if it can load in multiple copies of the models e.g. XTTS may only process multiple simultaneous requests if it can load 2x, 3x, 4x the model.pth files into the VRAM. But a queue that backs up the requests and processes 1, then 2, then 3, then 4 sequentially should be ok.... Its something I will be looking at more towards the end of the BETA release or possibly into the production release.

mercuryyy commented 3 months ago

@erew123 Thank you for the detailed response your doing amazing work with this repo.

I think even if you start with the option to load multiple model.pth into ram this would be a great addon that we can build on.

Parallel requests is probably the most important thing for production.

Thank you for the update!