Cheneng / DPCNN

Deep Pyramid Convolutional Neural Networks for Text Categorization in PyTorch
http://ai.tencent.com/ailab/media/publications/ACL3-Brady.pdf
199 stars 48 forks source link

Reuse of conv3 layer #6

Open thedomdom opened 4 years ago

thedomdom commented 4 years ago

I think it is wrong that the same conv3 layer is used for every convolution in the network. You implemented is as if they would share weights. However, in the paper, the convolutions don't share weights.

Have a look at the TensorFlow implementation over here: They use different weights for each conv3 layer.

Please review your coding.

qingyujean commented 3 years ago

I think the different weights should be used there, too. @thedomdom every block share the same convolution layer, is this reasonable ? @Cheneng Could you tell me why you implement it like that? do you have some special reasons or ideas?

thedomdom commented 3 years ago

No, I don't think that it is reasonable that every block shares the same conv layer weights.

qingyujean commented 3 years ago

but your code looks like every block use the only one conv layer,  right? or maybe i misunderstand the part of your model construction? I'm a little confused about that...I think it should be create a new conv block every step in the for-loop, right?

------------------ 原始邮件 ------------------ 发件人: "Cheneng/DPCNN" <notifications@github.com>; 发送时间: 2021年2月5日(星期五) 晚上8:46 收件人: "Cheneng/DPCNN"<DPCNN@noreply.github.com>; 抄送: "boss_girl"<995168634@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [Cheneng/DPCNN] Reuse of conv3 layer (#6)

No, I don't think that it is reasonable that every block shares the same conv layer weights.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

david-waterworth commented 3 years ago

I agree with the above, self.conv3 is reused multiple times, twice per block. So there's only one set of conv weights. I think the reuse of the other modules is ok (the activation and padding) as those modules don't have state but unless the intent is to specifically only have a single set of conv weights then this doesn't look correct.

I'd be inclined to factor out the following sequence into a separate module so it can be reused (with different layers in place of the repeated conv3

    # Convolution
    x = self.padding_conv(px)
    x = F.relu(x)
    x = self.conv3(x)

    x = self.padding_conv(x)
    x = F.relu(x)
    x = self.conv3(x)