SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
126 stars 11 forks source link

Which E2 TTS model was used on the demo page #4

Closed fakerybakery closed 20 hours ago

fakerybakery commented 20 hours ago

Hi, Thanks for releasing F5-TTS! I noticed on the demo page you listed some comparisons from E2-TTS, and in your inference code you have an option to load an E2-TTS checkpoint. Would you mind sharing which E2-TTS checkpoint you’re using? Thanks!

SWivid commented 20 hours ago

The compared samples are from original E2 TTS demo page https://aka.ms/e2tts/ We have reproduced a multilingual E2 TTS on Emilia_ZH_EN (a public in-the-wild dataset), which is the ckpt we released.

By setting

ode_method = 'midpoint'
sway_sampling_coef = 0.

you will obtain a vanilla E2 TTS, though these methods (sway_sampling for better performance, with euler for speed-up) are also beneficial for E2 TTS, or say (all) CFM-based model.

fakerybakery commented 20 hours ago

Thanks for clarifying!