Thanks again for your great work and code.
I tried different ways to take care of pruned weights, i.e. keep them zero, for example running the following line for each batch:
output.register_hook(lambda grad: grad * mask.float())
But this is very slow. I looked for your solution as yours is much faster but could not find the specific lines. Can you please elaborate on what you do to prevent pruned weights from updating (gradients backprop)?
Thanks again for your great work and code. I tried different ways to take care of pruned weights, i.e. keep them zero, for example running the following line for each batch: output.register_hook(lambda grad: grad * mask.float())
But this is very slow. I looked for your solution as yours is much faster but could not find the specific lines. Can you please elaborate on what you do to prevent pruned weights from updating (gradients backprop)?