Why zero bn layer at training?

Eric-mingjie / rethinking-network-pruning

Rethinking the Value of Network Pruning (Pytorch) (ICLR 2019)

MIT License

1.51k stars 293 forks source link

Why zero bn layer at training? #25

Closed huangbiubiu closed 5 years ago

huangbiubiu commented 5 years ago

In Network Slimming, the repo use BN_grad_zero to add the mask on the network. I think this should only be used in finetune phase. Why use it at training phase?

https://github.com/Eric-mingjie/rethinking-network-pruning/blob/74166a7e18d4a2d1dfe07fbac1ff7e5cf38935cb/imagenet/network-slimming/main.py#L225

Eric-mingjie commented 5 years ago

In reality, during the training phase, the value of BN scaling factors never go to zero. Then it is okay to put this during training as it will not influence the training process.