Difference in calculating mel-spectrogram between AutoVC and vocoders

auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

MIT License

1k stars 206 forks source link

Hi author, by using the provided checkpoint file of Hifi-gan to inference from mel-spectrograms extracted from AutoVC make_spect.py, I got a very low voice (the speaking speed was correct, though). What I'm not sure is, what the config.json file of that checkpoint is like? I noticed some tiny differences in the way mel-spectrograms are calculated that could probably cause the issue. AutoVC introduced fmax and fmin (as high as 90hz) to mel-filterbanks, while the original Higi-GAN didn't use these parameters. Thus I wonder what the config.json used to train the vocoder checkpoint is like. Thanks!

auspicious3000 / autovc

Difference in calculating mel-spectrogram between AutoVC and vocoders #105