In my humble opinion, the dropouts in convolution can be remove according to such articles like this blog and this reddit post. In convolution layers there are less parameters and the regularization that the dropout layer brings might not work well. Furthermore the convolution layers can be found in encoder, postnet and CBHG which might have little effect on mel spectrograms prediction. In contrast the dropout layer in decoder prenet and zoneout LSTM can be kept as I suggest. The loss obviously drop when I do it. The evaluation result is as follows. See https://github.com/begeekmyfriend/Tacotron-2/commit/a050b198e8831e950ae20a8646657dd8680cf7afmandarin_male_gl_no_dropout.zip
In my humble opinion, the dropouts in convolution can be remove according to such articles like this blog and this reddit post. In convolution layers there are less parameters and the regularization that the dropout layer brings might not work well. Furthermore the convolution layers can be found in encoder, postnet and CBHG which might have little effect on mel spectrograms prediction. In contrast the dropout layer in decoder prenet and zoneout LSTM can be kept as I suggest. The loss obviously drop when I do it. The evaluation result is as follows. See https://github.com/begeekmyfriend/Tacotron-2/commit/a050b198e8831e950ae20a8646657dd8680cf7af mandarin_male_gl_no_dropout.zip