Hi, this is an amazing project. I am using this project, flow-matching, for training, but the input features are the outputs of a language model encoder that I trained myself. I modified the data loading process and part of the model structure. The entire code has been validated on my own data and can synthesize speech. However, I am currently encountering a strange training-related issue: I am running two experiments simultaneously under examples/libritts/cosyvoice with the only difference being the number of epochs (200 vs 1000). Each experiment has its own run.sh and YAML files, and I have modified the model and TensorBoard storage paths. The training data is in the same data directory, and I submitted these two experiments to different machines for 8-GPU training. Surprisingly, I found that their losses are exactly the same, even down to the decimal point. Why is this happening? However, when I diffed the models at the same epoch, there were differences.
Background:
Both experiments read the same data from the same disk location, where the data includes npy files that are loaded using numpy.load in the processor.
Hi, this is an amazing project. I am using this project, flow-matching, for training, but the input features are the outputs of a language model encoder that I trained myself. I modified the data loading process and part of the model structure. The entire code has been validated on my own data and can synthesize speech. However, I am currently encountering a strange training-related issue: I am running two experiments simultaneously under examples/libritts/cosyvoice with the only difference being the number of epochs (200 vs 1000). Each experiment has its own run.sh and YAML files, and I have modified the model and TensorBoard storage paths. The training data is in the same data directory, and I submitted these two experiments to different machines for 8-GPU training. Surprisingly, I found that their losses are exactly the same, even down to the decimal point. Why is this happening? However, when I diffed the models at the same epoch, there were differences.
Background:
Both experiments read the same data from the same disk location, where the data includes npy files that are loaded using numpy.load in the processor.
Thanks.