auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
1k stars 206 forks source link

Difference in calculating mel-spectrogram between AutoVC and vocoders #105

Closed Irislucent closed 2 years ago

Irislucent commented 2 years ago

Hi author, by using the provided checkpoint file of Hifi-gan to inference from mel-spectrograms extracted from AutoVC make_spect.py, I got a very low voice (the speaking speed was correct, though). What I'm not sure is, what the config.json file of that checkpoint is like? I noticed some tiny differences in the way mel-spectrograms are calculated that could probably cause the issue. AutoVC introduced fmax and fmin (as high as 90hz) to mel-filterbanks, while the original Higi-GAN didn't use these parameters. Thus I wonder what the config.json used to train the vocoder checkpoint is like. Thanks!

Irislucent commented 2 years ago

Oops, it turns out that I made a silly mistake......I used a wrong output path, and keep downloading the old wrong results. So the fmin issue shouldn't matter. But still thanks if you clicked in.