effusiveperiscope / so-vits-svc

so-vits-svc
MIT License
179 stars 71 forks source link

Multiple iterations sound worse than a few #27

Closed mya2152 closed 1 year ago

mya2152 commented 1 year ago

I've got a few generative "G_" files, 1 from when the training was run to select the first few checkpoints and then 1 from around 48 hours of training (atleast 100,000 iterations or around 9000 epochs) and for some reason the sound generated from the first least-trained .pth file (only around an hour or two of training) sounds much clearer than the latest (48-60 hours training).

Any ideas?

effusiveperiscope commented 1 year ago

Overfitting? How much does the sound from the least trained file resemble your target speaker vs. the latest?

mya2152 commented 1 year ago

The latest just sounds washed out with artifacts in the audio almost like theres two different voices pitched slightly differently with many artifacts and sound quality is bad whereas the first one sounds like much higher quality but still some artifacts as if it needs more training. The thing is, I have another G_ file downloaded pre-trained off the internet which sounds incredible quality, that downloaded one shows G_496000 which indicates its been trained close to 500,000 iterations and it sounds amazing, no artifacts. So why would the least-trained model sound worse in this case when I try to train it myself.

Another weird thing, the G_2100 file from Google Colab sounds better than the G_2100 file from Paperspace Gradient's machines, in theory they should all sound the same regardless of machine, no?

effusiveperiscope commented 1 year ago

Quality is not just determined by the amount of training but the quality and quantity of data. Smaller datasets have a lower maximum bound for quality. I do not know the size or quality of your dataset compared to that used to train the 500k step model.

I have seen some inconsistencies in generations between different systems and pytorch versions but I do not know enough about the inner workings to know why this would be the case.