SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
126 stars 11 forks source link

Questions about the baseline models in the paper #7

Open WangHelin1997 opened 11 hours ago

WangHelin1997 commented 11 hours ago

Hi,

Thanks for your nice work and open source! Can I ask how did you get the results of baseline models in your paper, like VALL-E 2, MELLE, Voicebox and NaturalSpeech 3 in Table 1? You mentioned the score reported in baseline papers were from different subsets for evaluation. When I came to the VALL-E 2 paper, I did not see the corresponding results listed in their paper. Could you please explain more about the details of the baselines? Thanks!

SWivid commented 10 hours ago

Yes, we have mentioned in Appendix-A Baseline Details. e.g. for VALL-E 2, we compared with results reported in [29], which is MELLE paper as MELLE paper contains most VALL-E series scores

Also in MaskGCT, they compared with NS3, thus we report the scores with also NS3 paper

Results in 2 subset (as you can see from Tab.1)