PlayVoice / whisper-vits-svc

Core Engine of Singing Voice Conversion & Singing Voice Clone
https://huggingface.co/spaces/maxmax20160403/sovits5.0
MIT License
2.6k stars 919 forks source link

BigVGAN and Bigger Sampling Rates #71

Closed alefiury closed 1 year ago

alefiury commented 1 year ago

Congratulations on the repository! It's incredibly useful!

I have a question regarding the training of the BigVGAN model. It was originally trained at 22kHz and 24kHz, considering that in the so-vits pipeline, the vocoder is trained in conjunction with the acoustic model, is it possible to achieve satisfactory results by training at 44kHz, as done in the so-vits-fork repository, without having to pre-train the vocoder at 44khz? I noticed that this repository offers the option to train at 32kHz. In this case, was the checkpoint of the BigVGAN used also pre-trained at 32kHz?

MaxMax2016 commented 1 year ago

this project use 32k only, bigvgan needs more GPU memory to train, i failed to train higher than 32k, the pretrain model is 32k.