The output content may be mixed with other content

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

https://arxiv.org/abs/2410.06885

MIT License

7.69k stars 962 forks source link

The output content may be mixed with other content #562

Open zyq-zzz opened 1 day ago

zyq-zzz commented 1 day ago

Checks

[X] This template is only for question, not feature requests or bug reports.
[X] I have thoroughly reviewed the project documentation and read the related paper(s).
[X] I have searched for existing issues, including closed ones, no similar questions.
[X] I confirm that I am using English to submit this report in order to facilitate communication.

Question details

This model has failed to generate speech according to Text to Generate multiple times, and it will automatically add some other content. What is the reason for this and is there any solution

hequnbai commented 1 day ago

I have the same issue. I use mandarin, sometimes it generates unexpected replica of some words, sometimes skips a few ones. Not very stable. I also finetuned using private dataset, the timbre and prosody and speaking style get really close to the training and reference set, but still the error remains, and becomes even worse.

Anyone has a clue of the issue and give some solutions ? ...

liuhui881125 commented 1 day ago

Open advanced settings and clear reference text

zyq-zzz commented 1 day ago

Open advanced settings and clear reference text That's not the reason, that text cleans up every time it's opened

thesandi99 commented 1 day ago

Guys, this is not a high-end parameter model like 3B, 9B, or 90B, so these problems are bound to occur.