I was not impressed. I just tried it. Used the gradio and did an inference with a video game character and got halluination and bad voice result. What I am missing?

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

https://arxiv.org/abs/2410.06885

MIT License

5.23k stars 523 forks source link

I was not impressed. I just tried it. Used the gradio and did an inference with a video game character and got halluination and bad voice result. What I am missing? #225

Closed GPU-server closed 2 hours ago

GPU-server commented 3 hours ago

I just installed this Opened gradio, inseretd a 20 sec audio from a video game character talking (without music) Gave it a text and the result had an imaginary word at the beginning (hallucination), then the rest was correct but bad quality.

am I doing something wrong? Why is everybody talking about this?

SWivid commented 2 hours ago

Maybe you should at least follow our instructions to do inference. Everything is just there in readme, solved issues and discussions with all people's efforts.

GPU-server commented 2 hours ago

Maybe you should at least follow our instructions to do inference. Everything is just there in readme, solved issues and discussions with all people's efforts.

I moved it (re did a post) in Discussion section, maybe it is more fit there. could you show me an example with the voide of "Geralt" please? I would like to see how good it can reproduce it (Geralt = voice from the game: The Witcher3) I would LOVE to see an actual example I can reproduce . Please.