as-ideas / TransformerTTS

🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
https://as-ideas.github.io/TransformerTTS/
Other
1.13k stars 227 forks source link

Remove the last normalization layer in Postnet. #86

Closed taylorlu closed 3 years ago

taylorlu commented 3 years ago

The last normalization layer should be removed in Postnet, because the mel_linear and final_output must be close enough before and after normalization, this will become a paradox when training at the same time and make the Res module loss effect.

Here shows the loss curves when training with/without norm layer:

2021030119525620210301183002

Refer to the codes by NVIDIA's: https://github.com/NVIDIA/tacotron2/blob/185cd24e046cc1304b4f8e564734d2498c6e2e6f/model.py#L141-L144 https://github.com/NVIDIA/tacotron2/blob/185cd24e046cc1304b4f8e564734d2498c6e2e6f/model.py#L510-L511

cfrancesco commented 3 years ago

Hi, thank you for the very valuable insight and experiment. I will not merge the PR because we are switching to main branch, where the convolutions in postnet were removed entirely and I will leave this branch as it is to be compatible with the current pre-trained models.