[FastPitch 1.1/PyTorch] How to get target duration

NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

13.53k stars 3.23k forks source link

Hi @Nistrian,

Hi. Your fastpitch makes me very happy.

Glad to hear that!

The big difference in lengths interferes with finetuning with hifi-gan.

The only reasonable way for generating mel-spectrograms for HiFi-GAN finetuning is to use ground truth pitch and ground truth durations during synthesis with FastPitch. Otherwise the L2 loss in HiFi-GAN will not make any sense.

The easiest way to do this would be to catch them here, because ground truth conditioning is used to calculate L2 loss during training. You can dump spectrograms to disk inside the training loop, and comment out model updates.

NVIDIA / DeepLearningExamples

[FastPitch 1.1/PyTorch] How to get target duration #1007