Why init the parameters within self.W as zero?

AlexHex7 / Non-local_pytorch

Implementation of Non-local Block.

Apache License 2.0

1.57k stars 278 forks source link

Why init the parameters within self.W as zero? #4

Closed PkuRainBow closed 6 years ago

PkuRainBow commented 6 years ago

            nn.init.constant(self.W[1].bias, 0)
        else:
            self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                             kernel_size=1, stride=1, padding=0)
            nn.init.constant(self.W.weight, 0)
            nn.init.constant(self.W.bias, 0)

I just can not figure out whey initialize the weights and biases within the self.W as zero.

AlexHex7 commented 6 years ago

Because it can make the output of the block always zero (at the first batch before update the parameters). In this way, it can be inserted to any existing architecture, while does not affect the architecture original outputs. You can find it in 3.3 section of the paper, it saids

The residual connection allows us to insert a new non-local block into any pre-trained model, without breaking its initial behavior (e.g., if Wz is initialized as zero).