FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
6.43k stars 692 forks source link

same loss when running two experiments simultaneously under examples/libritts/cosyvoice #662

Open dbkest opened 3 days ago

dbkest commented 3 days ago

Hi, this is an amazing project. I am using this project, flow-matching, for training, but the input features are the outputs of a language model encoder that I trained myself. I modified the data loading process and part of the model structure. The entire code has been validated on my own data and can synthesize speech. However, I am currently encountering a strange training-related issue: I am running two experiments simultaneously under examples/libritts/cosyvoice with the only difference being the number of epochs (200 vs 1000). Each experiment has its own run.sh and YAML files, and I have modified the model and TensorBoard storage paths. The training data is in the same data directory, and I submitted these two experiments to different machines for 8-GPU training. Surprisingly, I found that their losses are exactly the same, even down to the decimal point. Why is this happening? However, when I diffed the models at the same epoch, there were differences.

Background:

Both experiments read the same data from the same disk location, where the data includes npy files that are loaded using numpy.load in the processor.

Thanks.

aluminumbox commented 2 days ago

check cosyvoice.yaml, we set random seed at begining. so if you do not change anything, the loss will be identical, so user can reproduce the result