ArkanDash / Advanced-RVC-Inference

Advanced RVC Inference for quicker and effortless model downloads
MIT License
31 stars 15 forks source link

how to reproduce the pitch, intonation and other nuances of a voice? I would like to find a tutorial or a guide (Intonation and Pitch in RVC or mastery of RVC Configuration)? #9

Open GPU-server opened 4 days ago

GPU-server commented 4 days ago

RVC ConfigurationI know how to reproduce a voice using a certain number of data, clean them and then run RVC. But I know nothing about the index file, or anythign related to: n_fft, hop_length, win_length, and n_mels, IVF, Flat ??

I would like to learn to not only reproduce a voice (vocal timbre) of someone, but also how to reproduce the pitch, intonation and other nuances of a voice.

Is that possible with RVC? If so, where can I learn that please?

ArkanDash commented 3 days ago

Sorry to break you up, but I have no knowledge either about those things. Since those variables are math stuff that I don't understand yet and are full of matrix data.

Iirc, Index files are like external/additional voice data, the more dataset you have the more bigger index file size.

If you want to learn more, you could try asking the original RVC creator or you could try asking here https://discord.gg/aihub

GPU-server commented 3 days ago

Sorry to break you up, but I have no knowledge either about those things. Since those variables are math stuff that I don't understand yet and are full of matrix data.

Iirc, Index files are like external/additional voice data, the more dataset you have the more bigger index file size.

If you want to learn more, you could try asking the original RVC creator or you could try asking here https://discord.gg/aihub

Since you are the creator, can I ask you a direct question please? (Also, ok that part, maybe I am imagining people using those stuff and me being missing stuff? Maybe only professionnal like the ones hired at elevenlabs know that stuff? idk, anyway I hoped. I asked gpt for some stuff but seemms too complicated hoped someone could explain) Anyway, my question: what would YOU (@ArkanDash ) do to try to train the voice of Geralt of Rivia (example of his voice: https://www.youtube.com/watch?v=zxau03Row9U) in order to obtain a voice ALMOST EXACT as his? I would not mind any long text, I can read as much as you can write lol (even if thats 2 pages heheh). Finally, where do you guys use the pth file obtained, do you only use it in the same rvc software? Here is an important question:

ArkanDash commented 3 days ago

Since you are the creator, can I ask you a direct question please? (Also, ok that part, maybe I am imagining people using those stuff and me being missing stuff? Maybe only professionnal like the ones hired at elevenlabs know that stuff? idk, anyway I hoped. I asked gpt for some stuff but seemms too complicated hoped someone could explain) Anyway, my question: what would YOU (@ArkanDash ) do to try to train the voice of Geralt of Rivia (example of his voice: https://www.youtube.com/watch?v=zxau03Row9U) in order to obtain a voice ALMOST EXACT as his? I would not mind any long text, I can read as much as you can write lol (even if thats 2 pages heheh). Finally, where do you guys use the pth file obtained, do you only use it in the same rvc software? Here is an important question:

  • Do you know ways to use the pth model outside and in other "voice techs" to introduce perhaps variations of all sorts? Please give me the best info You have (if you feel want to share what I am seeking via discord or whatever, it's okay just tell me whatever. Again the magical word "please".

First thing first, I'm not the original RVC creator, here is the original rvc project https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI

If you want to train a voice, you would need the voice dataset, it could be getting the voice inside the game or recording the audio from a cutscenes. "How long of the data do I need?" At least 10 minutes or more. It kinda depends, sometimes more dataset have its advantage than lower dataset. You just need to test it for yourself. If your voice has some kind of background sound you could use UVR (search on github) before training.

.pth files are generated using pytorch library which contains model data. .pth that has been generated by rvc software, can only be used on rvc software. The generated .pth files from rvc software are not compatible with other software, since the data inside needs a specific handler.