Open KaikeWesleyReis opened 5 months ago
Hey, I understand your thinking, and what you're doing is totally fine. But I have to give you a reality check. Yes, SVC can be used for audio replacement, but it seems like you're over-engineering it. You could just use TTS projects like GPT-Sovits instead of converting an existing TTS.
If you want to do voice conversion, then this project is definitely fine, but if your goal is just TTS, then GPT-Sovits is sufficient.
@ShadowLoveElysia
If you want to do voice conversion, then this project is definitely fine, but if your goal is just TTS, then GPT-Sovits is sufficient.
GPT-Sovits have the same idea of VITS fine tuning that I have done? Don't you think that I'll fall in the same mistakes of VITS fine tuning?
My voice is this: https://www.youtube.com/watch?v=YZt6NKrkdzQ&
Given the voice nature, do you believe that is possible to fine tune GPT-Sovits?
Hi, I'm developing a personal project of a conversational chatbot. The idea is quite simple: Have a chat with Harbinger, the first reaper (from mass effect series). I found a optimal solution to generate his voice through text: Using a vits-ljspeech-base from Coqui TTS (without any fine tuning) to generate a audio and use your SVC fine tuned to add the voice over the generated audio. For example, given this sentence:
Organic intellect, fascinated by the patterns of the universe. I, Harbinger, have witnessed the harmony of numbers governing the cosmos. The intricate dance of primes, the elegance of elliptic curves, and the recursion of Fibonacci's sequence all resonate with my being. Which aspect of number theory would you like to dissect, researcher?
I have this time for each step to :
Now I'm studying the inference code of your model and so far I have the following ideas:
It's possible to cut or pre-generate any vector to reduce other models inference (whispper, hubert, pitch and so on) and thus, svc inference time?
Btw, thanks for your repository: is the easiest for "prepare your data and run" that I got so far in deep learning field.
Cheers from Brazil,