Hi,Sorry to bother you again. In this function(basicsr/pruner/SSL_pruner.py:def _apply_reg(self)), my understanding is to add additional gradients to the scaling factors for further sparsity. Should the line m.act_scale_pre.grad += reg_pre[:, 0].view(1, -1, 1, 1) * m.act_scale be changed tom.act_scale_pre.grad += reg_pre[:, 0].view(1, -1, 1, 1) * m.act_scale_pre? My reasoning is that this is necessary to apply sparsity to m.act_scale_pre.
def _apply_reg(self):
for name, m in self.model.named_modules():
if name in self.layers and self.pr[name] > 0:
reg = self.reg[name] # [N, C]
m.act_scale.grad += reg[:, 0].view(1,-1,1,1) * m.act_scale
if hasattr(m, 'act_scale_pre'):
reg_pre = self.reg_pre[name]
#change this or not:
m.act_scale_pre.grad += reg_pre[:, 0].view(1, -1, 1, 1) * m.act_scale
# bias = False if isinstance(m.bias, type(None)) else True
# if bias:
# m.bias.grad += reg[:, 0] * m.bias
Hi,Sorry to bother you again. In this function(basicsr/pruner/SSL_pruner.py:def _apply_reg(self)), my understanding is to add additional gradients to the scaling factors for further sparsity. Should the line
m.act_scale_pre.grad += reg_pre[:, 0].view(1, -1, 1, 1) * m.act_scale
be changed tom.act_scale_pre.grad += reg_pre[:, 0].view(1, -1, 1, 1) * m.act_scale_pre
? My reasoning is that this is necessary to apply sparsity to m.act_scale_pre.