NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
BSD 3-Clause "New" or "Revised" License
5.06k stars 1.38k forks source link

noise problem #539

Closed donghahaha closed 2 years ago

donghahaha commented 2 years ago

Hello,

I produce mel-spectrogram and synthesize it with voice by using hifi-gan and griffin lim.

However, both synthesized voices have severe mechanical sounds. So, I continue model learning(more than 60,000 steps), but there is no difference in the degree of mechanical sound from the beginning.

My hyperparameters are the following: batch size: 32 weight_decay=1e-6 p_attention_dropout=0.1 p_decoder_dropout=0.1 learning_rate=1e-3

My data on the tensorboard is the following:

image image

image image