Asking about the long-promised RVCv3 pre-trained model

GabryB03 commented 4 weeks ago

Hello!

Sincerely, I don't know if somebody already asked this question but I'm here to ask to the official developers of this project (@RVC-Boss @fumiama @yxlllc) if there is a ETA or a plan for a release of the new RVCv3 pre-trained model which promises (in the README):

"Please look forward to the base model of RVCv3 with larger parameters, larger dataset, better effects, basically flat inference speed, and less training data required."

Consider that this project is followed by an entire community of appassionate, developers, and also normal users who are not capable or not have sufficient resources to train their own models, so they are probably waiting for an update by you.

I admire the hard work that you put on this project almost everyday, so much that I decided to create an entire community of people in Discord with my friends with the unique objective of using RVC for cloning voices professionally with the best AI tools that are found in the open-source community (Resemblyzer, UVR5, Demucs, citing some examples just for better explanation).

I hope that everything is going well and hoping also to get a response by your team. Thank so much for your awesome work. See you soon and stay well.

NOTE: A useful issue about the opportunity of changing the HuBERT model: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/1542 (it could be a useful addition to RVCv3 architecture)

yxlllc commented 3 weeks ago

We actually have some experimental models, but the performance improvements have not met expectations. A key point may be that the performance of contentvec (hubert_base.pt) pre-training limits the final upper limit, so now we are considering retraining a stronger contentvec model, but the training is very difficult, and the amount of data and computing power required are very high.

GabryB03 commented 3 weeks ago

I understand the issue that you are facing with ContentVec. To get a sufficient computing power, you could also try training the new models in a rent machine (https://vast.ai/) with powerful GPUs & CPUs, but this is an option depending on the budget and in what do you need for the training. I hope you will succeed and wish you the best luck with the continuation of the project.

Lukysoon commented 2 weeks ago

Hi @yxlllc, I think we could help you with these obstacles and move forward.

I am trying to train better hubert_base.pt (with ContentVec) because we realize that we reached the technical limits of RVC.

I'm in stage that I have prepared script for fine-tuning hubert with ContentVec and I'm doing experiments with it.

I would like to help you with the training. I can help you with the computing and also I have some traning scripts for hubert training, so I can share it with you. Please contact me here: lukysoon@pm.me

Thank you very much for your work. We all appreciate it!

GabryB03 commented 2 weeks ago

@Lukysoon thank you so much for your contribution to RVC!

I much hope that this will help the developers to get better results for the new RVCv3 architecture.

JackVinati commented 1 week ago

Why not using speaker- invariant clustering (SPIN) instead of contentVec/HuBERT?

https://arxiv.org/pdf/2403.06260

Lukysoon commented 1 week ago

Is there any implementation of this. Have you already tried with RVC?

JackVinati commented 1 week ago

Is there any implementation of this. Have you already tried with RVC?

There is already the repo with the pretrained models: https://github.com/vectominist/spin

I would like to know what developers think about it. @RVC-Boss @fumiama @yxlllc

Ps I want to try to train it on an A100 80gb and see the results

MethanJess commented 1 day ago

We actually have some experimental models, but the performance improvements have not met expectations.

@yxlllc Well, could you share one of them anyways? and just call it "RVC 2.5" or something?

RVC-Project / Retrieval-based-Voice-Conversion-WebUI

Asking about the long-promised RVCv3 pre-trained model #2013