Open Spotlight0xff opened 7 years ago
Hi @Spotlight0xff The conv1d is a function defined in modules.py: https://github.com/Kyubyong/tacotron/blob/master/modules.py#L43 See: https://github.com/Kyubyong/tacotron/blob/master/modules.py#L64 Batch-norm is used.
https://github.com/Kyubyong/tacotron/blob/master/networks.py#L38 has bn = false so no batch normalization for that layer I believe, am I missing something?
@msobhan69 if you look at https://github.com/Kyubyong/tacotron/blob/master/networks.py#L38 which is the second conv layer for the projection, bn=False which disables https://github.com/Kyubyong/tacotron/blob/master/modules.py#L64 so the batch normalization is not used for the last layer of projection layers.
If you see Table 1 on page 4 of the paper, the second layer of the Conv1D projections is described as conv-3-128-Linear
. If we don't apply activation, we shouldn't normalize, either.
@Kyubyong Yes, it is true that the table 1 from the original paper shows that, the second Conv1D projections layer is conv-3-128-Linear
. And the paper also mentions that batch-norm is used for all convolutional layers.
I think there is no conflict with between batch normalization
and activation function
. We can use cnn with linear activation and batch normalization with None
activation at the same time.
So, my doubt is why "If we don't apply activation, we shouldn't normalize, either.". @Kyubyong
@candlewill Thanks for your question. Well, what I meant was we shouldnt apply activation or normalization before the final layer because usually we are to yield logits. In this case, I thought we needed unnormalized outputs because I guessed that's what linear activation instead of non-linear one was used. Honestly I don't understand why linear activation is used for the second conv1d layer . Is it because of the residual connection? When applying residual connection, should prenet_out, which is unnormalized, and be added to a unnormalized tensor? Or it doesn't matter?
Hi,
Why is there no batch-normalization for the conv1d projection in the CBHG encoder network? The paper mentions that batch-norm is used for all convolutional layers, so why is this an exception?
The code is question: https://github.com/Kyubyong/tacotron/blob/master/networks.py#L38
Cheers, André