Open ShiromiyaG opened 1 month ago
After examining multiple RVC forks the result is the same - using net_g Synthesizer in a blank slate (without loading weights from a voice model) to infer a silent audio results in the following:
So training a model from scratch results in this heavy noise being blended and perhaps at 1000+ epochs diluted so only the solid 8KHz noise remains.
How exactly was he trained? Was there a different configuration used in the training? Because when I and other people tried to train a new model, there was a problem with lines in the spectrum of the audios generated by these pretrains. @RVC-Boss Were the speakers separated when the dataset was prepared? Was there a different config used in the .json files?
Here an imagem of the line problem: