I produce mel-spectrogram and synthesize it with voice by using hifi-gan and griffin lim.
However, both synthesized voices have severe mechanical sounds.
So, I continue model learning(more than 60,000 steps), but there is no difference in the degree of mechanical sound from the beginning.
My hyperparameters are the following:
batch size: 32
weight_decay=1e-6
p_attention_dropout=0.1
p_decoder_dropout=0.1
learning_rate=1e-3
Hello,
I produce mel-spectrogram and synthesize it with voice by using hifi-gan and griffin lim.
However, both synthesized voices have severe mechanical sounds. So, I continue model learning(more than 60,000 steps), but there is no difference in the degree of mechanical sound from the beginning.
My hyperparameters are the following: batch size: 32 weight_decay=1e-6 p_attention_dropout=0.1 p_decoder_dropout=0.1 learning_rate=1e-3
My data on the tensorboard is the following: