Lottery ticket gradient masking

kudkudak commented 5 years ago

A quick question. In LTH experiments in https://github.com/Eric-mingjie/rethinking-network-pruning/blob/master/cifar/lottery-ticket/weight-level/lottery_ticket.py#L293 gradients are zeroed for the weights that are masked out. But gradients of these zeroed weights take part in the backward pass. In other words the backward pass taken seems not equivalent to a backward pass of the corresponding thin network initialized using the lottery ticket. Is this intentional, or maybe I misunderstood something? Thanks!

kudkudak commented 5 years ago

See for comparison the original implementation https://github.com/google-research/lottery-ticket-hypothesis/blob/master/foundations/model_base.py#L110. They use a separate mask weights, so I think the backward pass is then equivalent to the corresponding thin network?

Eric-mingjie commented 5 years ago

But gradients of these zeroed weights take part in the backward pass.
The weights of zeroed weights have value zero, then in the backward pass, the gradients of these zeroed weights won't take part in the backward pass.

kudkudak commented 5 years ago

Ah. Ok. My bad :) Thanks

Eric-mingjie / rethinking-network-pruning

Lottery ticket gradient masking #23