Eric-mingjie / rethinking-network-pruning

Rethinking the Value of Network Pruning (Pytorch) (ICLR 2019)
MIT License
1.51k stars 293 forks source link

updateBN #51

Open wjsteve opened 3 years ago

wjsteve commented 3 years ago

Hello. I got a question while reproducing your interesting experiment.

In section 2 of https://arxiv.org/pdf/1708.06519.pdf, "Scaling Factors and Sparsity-induced Penalty" shows below equation. image

Question: g(γ) means L1 norm, but https://github.com/Eric-mingjie/rethinking-network-pruning/blob/master/imagenet/network-slimming/main_finetune.py#L187 applys torch.sign like "m.weight.grad.data.add_(sparsity * torch.sign(m.weight.data))" not L1 norm

So isn't it right to use "m.weight.grad.data.add_(sparsity * m.weight.data.abs()))" for updateBN?

Eric-mingjie commented 3 years ago

The gradient of l1 norm is the sign function.