Kyubyong / tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
Apache License 2.0
1.83k stars 437 forks source link

no batch-norm for conv1d in encoder #12

Open Spotlight0xff opened 7 years ago

Spotlight0xff commented 7 years ago

Hi,

Why is there no batch-normalization for the conv1d projection in the CBHG encoder network? The paper mentions that batch-norm is used for all convolutional layers, so why is this an exception?

The code is question: https://github.com/Kyubyong/tacotron/blob/master/networks.py#L38

Cheers, André

msobhan69 commented 7 years ago

Hi @Spotlight0xff The conv1d is a function defined in modules.py: https://github.com/Kyubyong/tacotron/blob/master/modules.py#L43 See: https://github.com/Kyubyong/tacotron/blob/master/modules.py#L64 Batch-norm is used.

onyedikilo commented 7 years ago

https://github.com/Kyubyong/tacotron/blob/master/networks.py#L38 has bn = false so no batch normalization for that layer I believe, am I missing something?

ghost commented 7 years ago

@msobhan69 if you look at https://github.com/Kyubyong/tacotron/blob/master/networks.py#L38 which is the second conv layer for the projection, bn=False which disables https://github.com/Kyubyong/tacotron/blob/master/modules.py#L64 so the batch normalization is not used for the last layer of projection layers.

Kyubyong commented 7 years ago

If you see Table 1 on page 4 of the paper, the second layer of the Conv1D projections is described as conv-3-128-Linear. If we don't apply activation, we shouldn't normalize, either.

candlewill commented 7 years ago

@Kyubyong Yes, it is true that the table 1 from the original paper shows that, the second Conv1D projections layer is conv-3-128-Linear. And the paper also mentions that batch-norm is used for all convolutional layers.

I think there is no conflict with between batch normalization and activation function. We can use cnn with linear activation and batch normalization with None activation at the same time.

So, my doubt is why "If we don't apply activation, we shouldn't normalize, either.". @Kyubyong

Kyubyong commented 7 years ago

@candlewill Thanks for your question. Well, what I meant was we shouldnt apply activation or normalization before the final layer because usually we are to yield logits. In this case, I thought we needed unnormalized outputs because I guessed that's what linear activation instead of non-linear one was used. Honestly I don't understand why linear activation is used for the second conv1d layer . Is it because of the residual connection? When applying residual connection, should prenet_out, which is unnormalized, and be added to a unnormalized tensor? Or it doesn't matter?