erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
944 stars 110 forks source link

How to fix robotic/metallic voice when adding new voices #219

Closed guispfilho closed 4 months ago

guispfilho commented 4 months ago

I installed AllTalk as part of the text-generation-webui and while using the standard voices, it works fine. The problem emerges if I try to add new files to the "voices" folder. I got some .mp3 sample voices and tried to convert to .wav, and placing them inside the "voices" folder, however they are really metallic for some reason. The voice samples don't have bacground noise, and are greater than 1 minute. As there are some parameters while converting the files, I don't know if I'm not choosing the right paramenters like "mono" or "stereo", sample rate (22050Hz, 44100Hz, 352800Hz, etc...) and Encoding (Signed 16-bit PCM, Signed 32-bit PCM, 64-bit float, etc...). Any suggestions on how to fix this robotic voice issue?

erew123 commented 4 months ago

Hi @guispfilho

Please go to the settings and documentation page:

image

and then to the section on using voice samples:

image

Beyond that, many things can depend on the voice sample you are using. The XTTS model is trying to reproduce the sound of the voice you have given it. Though, if that wavers away from being a "normal" human voice e.g. lets say a cartoon character, it may be that its having difficulty reproducing it. Typically if your samples are good and setup correctly, then it may be you can try Finetuning to improve the audio reproduction.

Thanks