PlayVoice / whisper-vits-svc

Core Engine of Singing Voice Conversion & Singing Voice Clone
https://huggingface.co/spaces/maxmax20160403/sovits5.0
MIT License
2.6k stars 919 forks source link

Another sample rate support? #50

Closed futorio closed 1 year ago

futorio commented 1 year ago

Hello! Is this repository support another train sample rate 44100?

MaxMax2016 commented 1 year ago

don't support. it needs more GPU memory to train high sample rate. may be you can try so-vits-svc-fork.

futorio commented 1 year ago

I have 24g of vram on rtx 4090 card. It will not be enough? I want to try to train on bigvgan-large-v2 branch.

MaxMax2016 commented 1 year ago

my GPU is 10G. so I have not the abilities to develope high sample rate. you can try.

futorio commented 1 year ago

So for turn train to 44100 i need to change in preprocess_a.py script flag "-s 44100" and "sampling_rate" parameter in configs/base.yaml

MaxMax2016 commented 1 year ago

nead to do more works, here use 10ms frame; for 44100, 10ms means 441 hop_size, this can not work; you may use https://github.com/svc-develop-team/so-vits-svc, this support large-v2 and bigvgan and 44100, or you can modify as this project.

futorio commented 1 year ago

Thank you for answers. Can i use pretrained models from bigvgan-large-v2 in so-vits-svc? I have noticed that there are differences in model file formats between the repositories.

MaxMax2016 commented 1 year ago

pretrain model is 32k

futorio commented 1 year ago

Thanks for all answers!

capric98 commented 1 year ago

Though issue is solved, still want to leave a comment IMHO 32k sample rate is more than enough for pure vocal data since human cannot produce 2k+ frequency voice. In this case 2*2k = 4k is enough theoretically.

futorio commented 1 year ago

Though issue is solved, still want to leave a comment IMHO 32k sample rate is more than enough for pure vocal data since human cannot produce 2k+ frequency voice. In this case 2*2k = 4k is enough theoretically.

Yes i will try on 32k sample rate