RVC-Project / Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!
MIT License
24.44k stars 3.61k forks source link

Questions about the pretrain #2319

Open ShiromiyaG opened 1 month ago

ShiromiyaG commented 1 month ago

How exactly was he trained? Was there a different configuration used in the training? Because when I and other people tried to train a new model, there was a problem with lines in the spectrum of the audios generated by these pretrains. @RVC-Boss Were the speakers separated when the dataset was prepared? Was there a different config used in the .json files?

Here an imagem of the line problem: image

AznamirWoW commented 1 month ago

After examining multiple RVC forks the result is the same - using net_g Synthesizer in a blank slate (without loading weights from a voice model) to infer a silent audio results in the following:

image

So training a model from scratch results in this heavy noise being blended and perhaps at 1000+ epochs diluted so only the solid 8KHz noise remains.