RVC-Project / Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!
MIT License
24.18k stars 3.58k forks source link

Issue when training a model, please help me #1028

Closed coldasicee closed 6 months ago

coldasicee commented 1 year ago

https://i.imgur.com/FJ8VZ5h.png

is it stuck, broken? I only used a1111 stable diffusion before and I could track the progress there. When I opened my task manager the browser was using like 10-20% of the GPU, sometimes going to 0%

I have everything installed, I think: downloaded the release pack from this github got python installed cuda pip install torch torchvision torchaudio curl -sSL https://install.python-poetry.org | python3 -

well, it looks like everything should be good to go, please help haha. Also, all the paths are without space _ or -

Thaanks guys

I'm trying to learn because my sister in law's father have that disease that their body degrades and there is no stopping it. He already can't get out of bed, is being fed by tube and he lost the ability to talk. He is communicating using a notebook with Tobii and some custom programs they bought from a doctor, but they wanted to give him the chance to speak with his voice again, so my plan was to train a model with his voice and then somehow adapt it to TTS.

Hope you can help us

pbanuru commented 1 year ago

Can you send the output from the commandline? Preferably in text, rather than image form. It may provide more information than what is shown in just that window.

Rokkonrol commented 1 year ago

I wouldn't even bother with Poetry—I had numerous issues when I went that route. Just use the pip install -r requirements.txt command instead. Also, like pbanuru said, posting your logs and errors in text form would be beneficial for others to provide better help.

coldasicee commented 1 year ago

Can you send the output from the commandline? Preferably in text, rather than image form. It may provide more information than what is shown in just that window.

I wouldn't even bother with Poetry—I had numerous issues when I went that route. Just use the pip install -r requirements.txt command instead. Also, like pbanuru said, posting your logs and errors in text form would be beneficial for others to provide better help.

I am really sorry guys, I went to work on it again to grab the output but decided to also remove spaces from the audio files and it seems it worked

thank you very much for your time anyway!

btw, do you guys have other training options I could try? some good written tutorials or even youtube (if it explains it well and what each setting does, I don't like those "follow me" tutorials haha). Also options of programs or githubs so I can use the trained model with TTS

Rokkonrol commented 1 year ago

btw, do you guys have other training options I could try? some good written tutorials or even youtube (if it explains it well and what each setting does, I don't like those "follow me" tutorials haha). Also options of programs or githubs so I can use the trained model with TTS

As far as I know there is no way to directly use an RVC-trained voice model for text-to-speech. RVC is for voice-to-speech only.

There may be a way to convert the timbre/tone of a pretrained TTS voice model using an RVC model. For example, you could generate a sentence with a TTS model, then process that audio via a program similar to RVC into the voice you want. I haven't found a way to easily do this though since it would likely require some sort of Python script to send the TTS audio into RVC for conversion. Also, this wouldn't be a viable solution since it can't be done on a tablet and would have to be processed on a PC with a dedicated GPU.

I only recently began researching the topic of custom TTS models so my knowledge on exactly what's out there is fairly limited. There are a few options for local TTS inferencing. The most promising option I've found so far is Coqui, though I haven't tried it out yet so I can't comment on whether it's good or not. The problem though is that—like before—it would require a dedicated GPU to use.

If you're looking for an easy-to-use TTS platform for custom voices I recommend using a cloud-based solution like ElevenLabs. The downside is that it's a paid service, but it's great at what it does. ~2 hours of generated audio per month is $22USD/month and ~10 hours of generated audio per month is $100USD/month. It requires only one minute of audio to train a voice and the quality of the trained voice is quite good. This would be a great solution since it doesn't require the uploaded audio to include specific spoken sentences. Since your sister-in-law's father can no longer speak, you could use any past audio of his voice for training. Best of all, it can be used on a tablet (or any device with a web browser).

As far as I'm aware there is no free service similar to ElevenLabs, but if you do end up finding one then let me know. Best of luck! @coldasicee

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 15 days since being marked as stale.