Open HRNPH opened 1 year ago
Instead of using TTS and Sovits together, why don't we just use TTS Vits?
The reason is when you want to train your own custom Voice TTS Model is hard to train Voice Conversion is easier to train since you only need an audio sample and not an audio-text pair
okay, I understand, right now I'm trying to retrain the model. Maybe I'll try with a pretrained model, then I'll fine tune with the existing dataset.
I have pre-trained model and dataset if you need one
I have pre-trained model and dataset if you need one
OK, I'll give it a try, can you send me the link?
This for datasets: https://drive.google.com/file/d/102PVFeKrYu8Pfo-4NBC5jsgVbHNiP_Uc/view?usp=share_link This for SoVits model: https://drive.google.com/drive/folders/1-5YkGLEmLvfFEE1x5cG08KdDrrIWf8pL?usp=share_link (this is the same model that was used as current base model)
basically current model of Voice Conversion(SoVits) pushed on #26 #27 are SHIT due to a low amount of training currently stored in Huggingface https://huggingface.co/openwaifu/SoVits-VC-Chtholly-Nota-Seniorious-0.1 which made it sound shitter than our old pipeline what we need to do is just train it more and maybe add more VC model Todo: