SayaSS / vits-finetuning

Fine-Tuning your VITS model using a pre-trained model
MIT License
546 stars 86 forks source link

Only Static Noises? #31

Open Seped69 opened 1 year ago

Seped69 commented 1 year ago

I have tried training a character using 51 audio text pair (47 in train.txt and 4 in val.txt) using the single speaker and changed the epoch to 500 but the file I got is G_4000, G_3000, D_4000, and D_3000, and when I tried to generate a voice using every single of them the generated audio is only static noises, do you know why?

BoredBean commented 1 year ago

I'm not sure about your issue. But I think you should always use the G model when you try generating. I have a similar question so I post it here:

I have trained mine 600 epochs without a pre-trained model. Now I get something that sounds like human voices, but with some severe metallic noise. There are lots of warnings saying:

/content/vits/utils.py:138: WavFileWarning: Chunk (non-data) not understood, skipping it.
  sampling_rate, data = read(full_path)

Is this normal? Or should I recollect the dataset and start anew?