Open Seped69 opened 1 year ago
I'm not sure about your issue. But I think you should always use the G model when you try generating. I have a similar question so I post it here:
I have trained mine 600 epochs without a pre-trained model. Now I get something that sounds like human voices, but with some severe metallic noise. There are lots of warnings saying:
/content/vits/utils.py:138: WavFileWarning: Chunk (non-data) not understood, skipping it.
sampling_rate, data = read(full_path)
Is this normal? Or should I recollect the dataset and start anew?
I have tried training a character using 51 audio text pair (47 in train.txt and 4 in val.txt) using the single speaker and changed the epoch to 500 but the file I got is G_4000, G_3000, D_4000, and D_3000, and when I tried to generate a voice using every single of them the generated audio is only static noises, do you know why?