SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
7.36k stars 885 forks source link

Improve inference speed on CPU while keeping flow adherence and accuracy #403

Closed ErfolgreichCharismatisch closed 19 hours ago

ErfolgreichCharismatisch commented 2 weeks ago

Checks

1. Is this request related to a challenge you're experiencing? Tell us your story.

I was trying to create speech on CPU for a finetuned and reduced safetensors model, but encountered very slow generation: 7 minutes and 40 seconds for a sentence with 8 words with 40 literals. This was frustrating because my goal is to use it as a replacement for coqui-ai/TTS / TTSv2 but with that speed, it's hopeless. Coqui-ai/TTS / TTSv2 is fast in cloning and generation but hallucinates every single time and they are unable to fix it, which made me switch to F5-TTS.

2. What is your suggested solution?

Increase generation speed to be competitive in speed with coqui-ai/TTS but make sure to not sacrifice on flow and accuracy.

SWivid commented 19 hours ago

will close this issue, feel free to open if further questions

will definitely consider efficiency problem in future plan