SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
126 stars 11 forks source link

Request for wandb report to compare loss curves #9

Open nuts-kun opened 9 hours ago

nuts-kun commented 9 hours ago

Hi, Thank you for sharing this great work! We have started to train this model on a multi-lingual dataset, including Japanese, and would be happy to compare it with the loss curve of the model trained for the paper to see how the learning progresses. Would it be possible for you to share the wandb report? Best regards.

SWivid commented 9 hours ago

Though I would be happy to share you with the curves, but as NAR, the edges of groundtruth and the predicted is not that matched, the train loss will down to 1 quickly and see nothing special after, or maybe a centre around 0.6 is pretty well image

The most effective way to determine whether the training process goes normally is to inference a sample. e.g. hearing something intelligible at 200K updates (maybe longer for E2, a bit more patience is all you need)

If you found some way helpful to learn better training process, welcome to share (e.g. have a validation loss, we haven't try yet)

nuts-kun commented 9 hours ago

Thank you very much for the quick response! Comparing the shared loss curve with the one from my model, it seems that the training is progressing smoothly :)

image

Thank you again!

SWivid commented 9 hours ago

Oh, one thing might matters. If your multilingual training set includes both ZH and JA, I have no idea if there would be confusion of model facing same formed Hanzi or Kanji (as we just leverage characters and do not distinguish them from different languages) Just JA or just ZH is fine~

nuts-kun commented 9 hours ago

I took that into consideration when writing the preprocessing and dataset class, so I think it should be fine ;)