Closed Nistrian closed 2 years ago
Hi @Nistrian,
Hi. Your fastpitch makes me very happy.
Glad to hear that!
The big difference in lengths interferes with finetuning with hifi-gan.
The only reasonable way for generating mel-spectrograms for HiFi-GAN finetuning is to use ground truth pitch and ground truth durations during synthesis with FastPitch. Otherwise the L2 loss in HiFi-GAN will not make any sense.
The easiest way to do this would be to catch them here, because ground truth conditioning is used to calculate L2 loss during training. You can dump spectrograms to disk inside the training loop, and comment out model updates.
Hi. Your fastpitch makes me very happy. However, I got a problem. I was not happy with the mels I get. I read that I can use dur_tgt to do more accurate mels. But I don't understand how to get them. Could you help me?
If this is important, then my goal is to keep the length of the generated mels as similar as possible to the length of the original ones, which I calculated with extract_mels.py. The big difference in lengths interferes with finetuning with hifi-gan.