erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
1.17k stars 123 forks source link

is it possible to add support for Styletts2 ,Gpt-Sovits and EmotiVoice, #289

Closed shivshankar11 closed 4 months ago

shivshankar11 commented 4 months ago

Is your feature request related to a problem? Please describe. NO

Describe the solution you'd like add support for Styletts2 which is very fast and good quality to, Gpt-Sovits and EmotiVoice,

Describe alternatives you've considered No

Additional context they all have python package- https://pypi.org/search/?q=StyleTTS+2 https://pypi.org/project/emotivoice/ https://pypi.org/project/gpt-sovits-infer/ https://pypi.org/project/gpt-sovits-python/

erew123 commented 4 months ago

Hi @shivshankar11

Potentially no issue adding them, its just a question of my time and being able to test/look for overlapping requirements issues.

I've added this conversation into the Feature requests list for others to reference.

Its also fully possible for yourself/others to add in a TTS engine using the template/guide here https://github.com/erew123/alltalk_tts/tree/alltalkbeta/system/tts_engines/template-tts-engine

Thanks

danielw97 commented 2 weeks ago

I'm not sure what the objective quality of stylets2 is as compared to something newer like f5tts, and what your thoughts are @erew123 as far as maintainability goes for all these different models particularly as styletts2 hasn't seen any commits in the last 8 months. At least for me, f5tts is sounding great so far and up to this point has continuous development. If styletts2 is something that folks are still interested in, I can try and take a crack at adding it, although I'm newer to contributing to a project like this. Do let me know your thoughts though, and thanks for all of your work on this project as it is something I use daily.

erew123 commented 1 week ago

@danielw97 Sorry for taking time to get back to you! I do have quite a few people asking for StyleTTS2, so I was planning on having a crack at it at some point. So if you would like a shot at it, that would be fantastic!

However, you might want to hang back a couple of days before you do it. I working on a PR atm with someone, they have come up with a better config file management system. As a part of having to rip through loads of code, Im massively improving the debugging on AllTalk, like 600% better! So things like you can now see what the last function the code was in e.g.:

image

So, if your code goes wrong/breaks down etc, you can at least see exactly where it was, making it so much easier to debug/fix. Ive yet to make it to the template TTS engine files, but thats on my list this week, so that will give you improved debugging when adding a new TTS engine. Also, I was intending on giving the documentation for adding a new TTS engine a bit of a spruce up.... You may have noticed the wiki here has been getting quite updated recently. Ive been having a think about how to massively improve the instructions for adding a TTS engine, so also planned to get that written this week! (planned at least).

So if you do fancy a shot, you may want to just give it a couple more days and it will make it much much easier for you to figure it out OR share debug issues if you need :)

I can post back here if you like when Ive gotten it done and ready?

Thanks

danielw97 commented 1 week ago

@erew123 that sounds good to me, if you let me know when the refactoring is done I'll open a new discussion to hopefully chart my progress assuming I don't get stuck.

shivshankar11 commented 1 week ago

I'm not sure what the objective quality of stylets2 is as compared to something newer like f5tts, and what your thoughts are @erew123 as far as maintainability goes for all these different models particularly as styletts2 hasn't seen any commits in the last 8 months. At least for me, f5tts is sounding great so far and up to this point has continuous development. If styletts2 is something that folks are still interested in, I can try and take a crack at adding it, although I'm newer to contributing to a project like this. Do let me know your thoughts though, and thanks for all of your work on this project as it is something I use daily.

styletts2 can replicate emotion of given voice sample much better than xtts2.

VideoFX commented 4 days ago

Is there support for GPT-SoVITS?

erew123 commented 4 days ago

@VideoFX Currently supported https://github.com/erew123/alltalk_tts/wiki/AllTalk-V2-QuickStart-Guide#6-tts-engine-settings and Planned

erew123 commented 3 days ago

@danielw97 Just FYI, I am working on the "Guide to Integrating New TTS Engines into AllTalk" https://github.com/erew123/alltalk_tts/wiki/Guide-to-Integrating-New-TTS-Engines-into-AllTalk

It looks a bit "holy shit!!" but Im trying to make it so you can spend maybe 3-5 minutes just reading though the top section of that, then you will be able to dive into the template files and get going, referring back to that Guide if you need to. I still have some bits to do on the template files and on the guide, but its getting there! Obviously Im trying to make all the files simple in a way that you can look at the code and go "ok, it tells me what I need to do here, most of the legwork is done I need to add 5 lines of code to this section then do the next section etc and that was easy to understand"

As I've been tidying up all the code, Ive only mostly re-done the XTTS engine script, though I still need to add some error capturing to it and then I can fully populate that into the template file.

But, just saying Im working on it......among other things + life