Plachtaa / FAcodec

Training code for FAcodec presented in NaturalSpeech3
154 stars 15 forks source link

What do the loss curves look like during your successful training? #16

Open YuXiangLin1234 opened 1 month ago

YuXiangLin1234 commented 1 month ago

Hello,

I've attempted to train FAcodec using my own dataset. However, whether I start from scratch or fine-tune your provided checkpoint, the reconstructed audio clips are just noise. I fine-tuned the model using around 128 hours of Common Voice 18 ZH-TW data. After approximately 20k steps, the loss seemed to converge. Some losses, like feature loss, decreased successfully, while others, such as mel loss and waveform loss, were oscillating.

Do all losses decrease during your training process?

Plachtaa commented 1 month ago

Could you please share your voice examples and loss curves? I believe they can help for analyzing the issue you encountered

YuXiangLin1234 commented 1 month ago

The loss curve looks like:

image

The audio samples are as follows: https://huggingface.co/datasets/mozilla-foundation/common_voice_16_0/viewer/zh-TW

The reconstructed audio sample: https://drive.google.com/file/d/1yk_xZL17FkhIYMjojesd-PHWyAKuqzSA/view?usp=sharing

Plachtaa commented 1 month ago

According to the mel_loss in the loss curve you shared, the model seems to have converged well. However, the reconstructed audio samples sounds to be generated by a randomly initialized model. May I know whether the reconstructed sample is retrieved from tensorboard or through another reconstruction script?