Lackluster Performance on GPU

I get a performance boost when using GPU, but inference is still much slower than I'd expect.

For context, inference can take up to 8 seconds, but I can generate 512x512 stable diffusion images in under 1 second on my current hardware setup.

Is there an easy way for me swap in a different vocoder to test for speed/quality increases?

CorentinJ / Real-Time-Voice-Cloning