FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
4.66k stars 471 forks source link

Training Problem #275

Open blackbird-fish opened 1 month ago

blackbird-fish commented 1 month ago

I trained scratch on about 700 hours ZH data, the audio sounds noisy. Several attempt tells that ,It seemed flow cannot be trained scratch from big data, it needs to be trained on comparetively small data to get a stable base. Can you share the train strategy?

aluminumbox commented 1 month ago

well 700hours is far from big data, we recommand using cosyvoice.fromscratch.yaml

blackbird-fish commented 1 month ago

I set the lr and warm up steps according to fromscatch.yaml, but the synthesized audio still sounds noisy. Is there any other advice? thank u a lot

aluminumbox commented 1 month ago

I set the lr and warm up steps according to fromscatch.yaml, but the synthesized audio still sounds noisy. Is there any other advice? thank u a lot

our cosyvoice.yaml is trained on 10k+ hour data. we will provide a ZH based example later. If your audio is still noisy, I suggest reduce lr and increase batch size and accum_grad so the training is more stable

blackbird-fish commented 4 weeks ago

when will ZH based example be provided? together with the KV cache inference update? Looking forward to your great work!