coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
33.39k stars 4.05k forks source link

[Feature request] Adjust output audio speed in YourTTS #3966

Open Rakshith12-pixel opened 1 month ago

Rakshith12-pixel commented 1 month ago

Hello,

I have finetuned YourTTS on a number of new speakers, and the quality of audio, pronounciation is good. However, the audio output is a bit fast. I have tried postprocessing like resampling etc, but it changes the pitch.

There is a speed feature available in xttsv2. Can we have a similar one for YourTTS or is there any workaround for this?

Some inputs would be highly appreciated.

Thanks

JamesD-git commented 1 month ago

I've tried going through all the config files and changing length scale, that didn't seem to work but I think if you use the VITS backend rather than glowTTS it should improve results

Not actually sure how to do this, but if you figure it out let me know!

JamesD-git commented 1 month ago

Hey there @Rakshith12-pixel, if you are on Mac head to Users/{User}/Library/Application Support/tts/tts_models.......your_tts and there is a config file in there. On line 335 you'll find length scale

I'm sure there's something similar on windows but don't have a machine to access/adjust

Rakshith12-pixel commented 1 month ago

Thanks @JamesD-git . Just for clarification - isnt the backend VITS already for YourTTS? I have tried to add the speed feature, similar to the one in XTTS but couldn't do that.

Also, I am on Operating System: Ubuntu 22.04.4 LTS.

Any ideas on how to proceed?

JamesD-git commented 4 weeks ago

@Rakshith12-pixel Yes, the default is VITS, I got that wrong - The length scale in config is a valid workaround, it works better on long sentences than short ones but I've found that 2.4 is a good setting - Would definitely like to see this feature properly implemented in a pull request though!