Hi author,
by using the provided checkpoint file of Hifi-gan to inference from mel-spectrograms extracted from AutoVC make_spect.py, I got a very low voice (the speaking speed was correct, though). What I'm not sure is, what the config.json file of that checkpoint is like?
I noticed some tiny differences in the way mel-spectrograms are calculated that could probably cause the issue. AutoVC introduced fmax and fmin (as high as 90hz) to mel-filterbanks, while the original Higi-GAN didn't use these parameters. Thus I wonder what the config.json used to train the vocoder checkpoint is like.
Thanks!
Oops, it turns out that I made a silly mistake......I used a wrong output path, and keep downloading the old wrong results. So the fmin issue shouldn't matter. But still thanks if you clicked in.
Hi author, by using the provided checkpoint file of Hifi-gan to inference from mel-spectrograms extracted from AutoVC make_spect.py, I got a very low voice (the speaking speed was correct, though). What I'm not sure is, what the config.json file of that checkpoint is like? I noticed some tiny differences in the way mel-spectrograms are calculated that could probably cause the issue. AutoVC introduced fmax and fmin (as high as 90hz) to mel-filterbanks, while the original Higi-GAN didn't use these parameters. Thus I wonder what the config.json used to train the vocoder checkpoint is like. Thanks!