auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
976 stars 207 forks source link

Analysis window length (fft_length) and hop_length for feature extraction #91

Open rppravin opened 3 years ago

rppravin commented 3 years ago

Thanks for the code and replies to my previous questions!

In the code (make_spect.py), fft_length (analysis window length) of 1024 samples or 64 ms and hop_length of 256 samples or 16 ms between windows, are used for feature extraction.

I was wondering if you tried other fft_lenths and hop_lengths? Is it possible that shorter fft_length and hop_length could help, say 20 ms fft_length and 10 ms hop_length, since it has different time-frequency resolution trade-off for feature extraction?

Thanks, Pravin

auspicious3000 commented 3 years ago

No I didn't, but feel free to experiment with it.