shot sentences question,and dataset question.

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

MIT License

7.36k stars 885 forks source link

Checks

[X] This template is only for question, not feature requests or bug reports.
[X] I have thoroughly reviewed the project documentation and read the related paper(s).
[X] I have searched for existing issues, including closed ones, no similar questions.
[X] I confirm that I am using English to submit this report in order to facilitate communication.

Question details

First question: Does a sampling rate of 16000 have any impact on the audio? Does it have to be 24000? The second issue is that when generating sentences of one or two words, it is not possible to generate them correctly. For example: Hello, thank you. May I ask how to adjust it?

SWivid / F5-TTS

shot sentences question,and dataset question. #433

Checks

Question details