NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.53k stars 3.23k forks source link

[FastPitch 1.1/PyTorch] How to get target duration #1007

Closed Nistrian closed 2 years ago

Nistrian commented 3 years ago

Hi. Your fastpitch makes me very happy. However, I got a problem. I was not happy with the mels I get. I read that I can use dur_tgt to do more accurate mels. But I don't understand how to get them. Could you help me?

If this is important, then my goal is to keep the length of the generated mels as similar as possible to the length of the original ones, which I calculated with extract_mels.py. The big difference in lengths interferes with finetuning with hifi-gan.

alancucki commented 3 years ago

Hi @Nistrian,

Hi. Your fastpitch makes me very happy.

Glad to hear that!

The big difference in lengths interferes with finetuning with hifi-gan.

The only reasonable way for generating mel-spectrograms for HiFi-GAN finetuning is to use ground truth pitch and ground truth durations during synthesis with FastPitch. Otherwise the L2 loss in HiFi-GAN will not make any sense.

The easiest way to do this would be to catch them here, because ground truth conditioning is used to calculate L2 loss during training. You can dump spectrograms to disk inside the training loop, and comment out model updates.