SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
7.36k stars 886 forks source link

Questions about base models' training time, loss, etc. #406

Closed ruby11dog closed 2 weeks ago

ruby11dog commented 2 weeks ago

Checks

Question details

hello, 非常棒的工作!我在基础模型上进行了一些模型结构的调整,并在Emilia数据集上重新开始训练底模,想问下,loss到多少的时候模型开始可以正常发声呢?你们在训练开源的这个底模的时候,用了多少机器和时间能达到正常发声呢?多少时间能到收敛呢?

SWivid commented 2 weeks ago

Hi @ruby11dog , we could use English to submit this report in order to facilitate communication. (Checks 4.)

Loss is not significant to see how training process goes as pred and gt boundries are mismatched. #9

We have posed all results and detailes of training and evaluation in our paper. For base model 8*A100 80G over one week to reach 1.2M updates, 200~400k to hear some aligned speech (say, intelligible)

SWivid commented 2 weeks ago

using English to submit this report in order to facilitate communication.