the initialization of batchnorm and some other layers

AlexHex7 / Non-local_pytorch

Implementation of Non-local Block.

Apache License 2.0

1.57k stars 277 forks source link

the initialization of batchnorm and some other layers #22

Closed qinziqiao closed 5 years ago

qinziqiao commented 5 years ago

nn.init.constant_(self.W[1].weight, 0)
nn.init.constant_(self.W[1].bias, 0)

Hi guy. I have a question why the bn.weight is initialized as zero

AlexHex7 commented 5 years ago

@qinziqiao Hi, you can find the reason in Section 4.1 of the Paper which said

The scale parameter of this BN layer is initialized as zero, following [17]. This ensures that the initial state of the entire non-local block is an identity mapping, so it can be inserted into any pre-trained networks while maintaining its initial behavior.

qinziqiao commented 5 years ago

Thanks for your reply. I'm so careless that neglect this line.