CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
51.65k stars 8.66k forks source link

Lackluster Performance on GPU #1174

Open pmcanneny opened 1 year ago

pmcanneny commented 1 year ago

I get a performance boost when using GPU, but inference is still much slower than I'd expect.

For context, inference can take up to 8 seconds, but I can generate 512x512 stable diffusion images in under 1 second on my current hardware setup.

Is this issue still accurate? https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/1013

Is there an easy way for me swap in a different vocoder to test for speed/quality increases?