Getting the voice conversion work on short segments (~120 ms)

auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

https://arxiv.org/abs/1905.05879

MIT License

976 stars 207 forks source link

Closed rppravin closed 3 years ago

rppravin commented 3 years ago

Thanks for the code!

In the default settings of the code, training uses ~2 sec segments (with 1:16 downsampling at the bottle neck layer).

Is it possible to modify the code to get voice conversion working for ~120 ms segments? Would zero padding work?

Thanks in advance, Pravin

auspicious3000 commented 3 years ago

In this case, you don't need to modify the code. Just pad your segments to the nearest multiple of 16 frames.