auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
983 stars 207 forks source link

Very slow inference #58

Closed samialsindi closed 3 years ago

samialsindi commented 3 years ago

Hi, impressive work! I've tried to add a new speaker into the model (using your pretrained models), went through the make spectrogram and make metadata steps; had to edit the train.pkl to fit the same structure as the metadata.pkl. I think I've done everything right but it is taking 1 hour to produce a 10 second inference on a GPU instance (p2.xlarge). Is this expected? Is there something I'm doing very wrong?

Thanks in advance

auspicious3000 commented 3 years ago

It is due to the wavenet vocoder. You can make it fast by using other vocoders.