erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
1.14k stars 118 forks source link

Are you going to add latest Parler TTS large v1 and mini V1 models? #302

Closed maxbizz closed 3 months ago

maxbizz commented 3 months ago

Parler TTS just released Parler TTS Large v1 and Mini V1 model. They produce high quality TTS compared to previous model and also woth speaker consistency. When can we see the new models added to alltalktts?

erew123 commented 3 months ago

Hi @maxbizz

Already done https://github.com/erew123/alltalk_tts/commits/alltalkbeta

Thanks

maxbizz commented 3 months ago

just checked with the large v1 model. And it didnt work. The Ui hangs when i try to generate more than one sentence. And the Output is also not good. It just keeps halucinating and repeating words.

erew123 commented 3 months ago

Hi @maxbizz

I noted this in the help section of the TTS engine.

image

And I found a few people on Reddit and other areas commenting about such issues e.g.

image

image

All I can say at this point in time is the implementation of Parler is as recommended by Parler themselves, there is nothing new, special or clever that AllTalk does that is not Parler's suggested implementation https://github.com/huggingface/parler-tts?tab=readme-ov-file#-using-a-specific-speaker

As/when/if Parler update their code to improve/resolve any issues like this, you can update AllTalk/Parler with:

1) Running the start_environemnt script at a command prompt/terminal. 2) Once the Python environment has started, you can run pip install -U git+https://github.com/huggingface/parler-tts.git to update the Parler codebase on your system.

There is obviously nothing I can personally do to resolve either issues with Parler's codebase/AI model and I would watch/report issues to the Parler team on Reddit or Github.

Thanks