Conqui v2 2.0.3 sounding better somehow.

erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.

GNU Affero General Public License v3.0

1.15k stars 118 forks source link

A bit different than erew123's chat findings from back in November here, and not sure why. I just tried to train over the coqui v2 2.0.3 model and got less "feeling they are reading from a script" when voice talks.

Accuracy similar or better and better flow. I'd recommend trying it out again with method below just to make sure, link here: https://huggingface.co/coqui/XTTS-v2/tree/v2.0.3

For anyone here that wants to try: Backup the files in \extensions\alltalk_tts\models\xttsv2_2.0.2 download all the new files and put in there. Train over base again.

Not sure if this was the difference: I merged 3 of the most accurate samples from the previous trainings in voices folder. Merged them into one file as a .wav and placed in the \extensions\alltalk_tts\finetune\put-voice-samples-in-here folder, and used that for finetuning (in addition to the original dataset) then used that same merged .wav file for the voice selection.

erew123 / alltalk_tts

Conqui v2 2.0.3 sounding better somehow. #122