In Network Slimming, the repo use BN_grad_zero to add the mask on the network. I think this should only be used in finetune phase. Why use it at training phase?
In reality, during the training phase, the value of BN scaling factors never go to zero. Then it is okay to put this during training as it will not influence the training process.
In Network Slimming, the repo use
BN_grad_zero
to add the mask on the network. I think this should only be used in finetune phase. Why use it at training phase?https://github.com/Eric-mingjie/rethinking-network-pruning/blob/74166a7e18d4a2d1dfe07fbac1ff7e5cf38935cb/imagenet/network-slimming/main.py#L225