Open talbaumel opened 7 years ago
When convolving inputs, the zero-padding added to the top rows of input layer makes sure that a hidden state does not contain information from future words.
I feel like zero padding should be used in every convolution layer. like this https://github.com/openai/pixel-cnn/blob/master/pixel_cnn_pp/nn.py#L296.
@ruotianluo Zero padding is used in every layer to keep the layer size same: https://github.com/anantzoid/Language-Modeling-GatedCNN/blob/master/model.py#L62 The zero padding I referred to in the above comment is the extra padding required to prevent the filter from seeing the future words.
only padding mask_layer[:,0:conf.filter_h/2,:] = 0 can prevent the filter from seeing the future words? why not (conf.filter_h-1)
only padding at the first layer can prevent the filter from seeing the future words? sorry,i can't understand it, can you tell me in detail . thank you very much.
Yes, I have the same concern here. I output some trace messages:
xbatch[0] = [[ 1 1 3 13 123 5 12 152 7 84 129 21 106 48 5 14 89 30 6 140 6] [ 57 88 5 25 60 23 2 4 1 1 3 13 51 10 22 136 68 28 105 6 52] [104 121 11 54 10 134 10 138 22 64 151 47 133 69 2 4 1 1 3 13 97]]
ybatch[0] = [[ 1 3 13 123 5 12 152 7 84 129 21 106 48 5 14 89 30 6 140 6 118] [ 88 5 25 60 23 2 4 1 1 3 13 51 10 22 136 68 28 105 6 52 90] [121 11 54 10 134 10 138 22 64 151 47 133 69 2 4 1 1 3 13 97 46]]
@thangduong I agree with you. I found the mask and padding is only applied on the embedding layer, while the subsequent conv layers are not. I guess it may cause the future information will be peeked during the middle conv layers. What do you think about it now?
@ruotianluo Zero padding is used in every layer to keep the layer size same: https://github.com/anantzoid/Language-Modeling-GatedCNN/blob/master/model.py#L62 The zero padding I referred to in the above comment is the extra padding required to prevent the filter from seeing the future words.
Do you mean there you used SAME padding
, which would add zero padding to produce same size output and also prevent next conv layer to see the future information?
If so, I dont think it is correct. Because the SAME padding
in tensorflow should pad at left and right side as evenly as possible, if not possible, then let the right paddings more 1. But, if your want to prevent seesing future, the paddings should all be at the left side.
@sonack I agree with you. We need padding filter_size-1 zeros to the left with each layer.
@qixiang109 Are you working on this gated cnn?do you have successfully reproduce the paper’s result?I hope we can communicate with each other :)
@sonack 不好意思很久没看这边,我还没具体做过gated cnn语言模型的实验,但是我很确定补0的方式
Lets say a sentence in the data set is (1,2,3,4) Then prepare_data function will create: X = (1,2,3) Y = (2,3,4)
While predicting 2 and 3 your model can copy them from the input