CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
52.14k stars 8.72k forks source link

Improve fidelity? #1090

Open hackerfactor opened 2 years ago

hackerfactor commented 2 years ago

First: I got it to work on the first try. Excellent instructions. (Most github python projects take days of battling to get all of the dependencies to work. This took minutes.) The only hangup I hit: pip install numba==0.51 (You need 0.51 or later.)

Second: I gave it a long sample (20 minutes) of my own voice at 44.1kHz. (I had audio from a presentation that I had previously recorded.) The output it generates sounds like me. Very impressive.

Now for the issue: While the voice sounds like me, there is a grainy quality, like it is a low fidelity audio output with lots of pops and clicks. Is there a way to improve the audio quality? I noticed that it outputs at 16kHz. Is there any way to ramp that up to 22kHz or higher?

raccoonML commented 2 years ago

It can be improved by training a new vocoder model from scratch on higher quality data. You can preprocess the dataset at a higher sampling rate, and the vocoder will output at that sample rate.