Open zxd-cqu opened 1 year ago
Yes, more details can be found in the research paper and source code
However, it seems that directly modifying the code to use l1-norm as the criterion is not compatible because using the l1-norm of filters cannot prune the input to the first convolutional layer of residual blocks...
I cannot get your idea. Actually, in the code, it is processed. You can take a look at the code to see it.
for name, module in model.named_modules():
if name in layers:
layer = layers[name]
out = get_score_layer(name,module, wg=wg, criterion=criterion)
score = out['score']
layer.score = score
layer.prescore = out['act_scale_pre']
if raw_pr[name] > 0: # pr > 0 indicates we want to prune this layer so its score will be included in the <all_scores>
all_scores = np.append(all_scores, score)
if hasattr(module, 'act_scale_pre'):
all_scores = np.append(all_scores, out["act_scale_pre"])
In this code snippet, if the criterion is set to L1-norm, then the score represents the L1-norm of the filters, and it is added to the all_scores array. Additionally, all_scores = np.append(all_scores, out["act_scale_pre"]) includes the scaling factors as scores in all_scores as well. I think these are two different scores, and they can be combined for importance ranking.
They essentially represent the same thing, emphasizing the importance of filters. The separate naming is simply for the convenience of code organization and readability. I still cannot get your issue.
I'm sorry, I just started working on pruning-related tasks, so there may be issues with my expression. What you mean is that the L1-norm calculated using the absolute average of filter weights and the act_scale_pre calculated using the scaling factors can be combined for importance ranking, correct?
all_scores = np.append(all_scores, score)
if hasattr(module, 'act_scale_pre'):
all_scores = np.append(all_scores, out["act_scale_pre"])
The score
can be set asL1-norm
or act_scale
. I understand that when it is set as act_scale
, it can be used for importance ranking along with the act_scale_pre
in the third line. However, when it is set as L1-norm
, is it correct to perform importance ranking together with the act_scale_pre
in the third line?
No, act_scale_pre item is also calculated by L1-norm as act_scale, please see get_score_layer function
If the criterion = 'l1-norm'
in the get_score_layer(name,module, wg='filter', criterion='l1-norm')
function. Then at the the second-to-last line in get_score_layer
function, out['score'] = out[criterion]
would be out['score'] = out['l1-norm']
. The value of out['l1-norm']
is calculated using l1 = module.weight.abs().view(-1, num_fea * scale * scale, 3, 3).mean(dim=[1, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=1)
. Therefore, out['score']
represents the score obtained from the weights of the filters.
So the outer function two lines : score = out['score']
and all_scores = np.append(all_scores, score, axis=0)
, means the absolute mean value of the filters weights is added to all_scores. On the other hand, all_scores = np.append(all_scores, out["act_scale_pre"])
adds the L1-norm related to the scaling factors. The difference lies in these two types of importance scores. Can these two different importance scores be combined for ranking?
def get_score_layer(name,module, wg='filter', criterion='l1-norm'):
r"""Get importance score for a layer.
Return:
out (dict): A dict that has key 'score', whose value is a numpy array
"""
# -- define any scoring scheme here as you like
shape = module.weight.data.shape
if "upconv" in name:
if wg == "channel":
l1 = module.weight.abs().mean(dim=[0, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=0)
elif wg == "filter":
scale=2
num_fea=64
l1 = module.weight.abs().view(-1,num_fea*scale*scale,3,3).mean(dim=[1, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=1)
elif wg == "weight":
l1 = module.weight.abs().flatten()
else:
if wg == "channel":
l1 = module.weight.abs().mean(dim=[0, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=0)
elif wg == "filter":
l1 = module.weight.abs().mean(dim=[1, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=1)
elif wg == "weight":
l1 = module.weight.abs().flatten()
# --
out = {}
out['l1-norm'] = tensor2array(l1)
if "upconv" in name:
out['act_scale'] = tensor2array(module.act_scale.abs().view(-1)) if hasattr(module, 'act_scale') else [1e30] * (module.weight.size(0)//4)
if hasattr(module, 'act_scale_pre'):
out['act_scale_pre'] = tensor2array(module.act_scale_pre.abs().view(-1))
else:
out['act_scale_pre'] = [1e30] * module.weight.size(1)
else:
out['act_scale'] = tensor2array( module.act_scale.abs().view(-1)) if hasattr(module, 'act_scale') else [1e30] * module.weight.size(0)
if hasattr(module, 'act_scale_pre'):
out['act_scale_pre'] = tensor2array(module.act_scale_pre.abs().view(-1))
else:
out['act_scale_pre'] = [1e30] * module.weight.size(1)
# 1e30 to indicate this layer will not be pruned because of its unusually high scores
out['score'] = out[criterion]
return out
Should
--prune_criterion l1-norm
be added in ./scripts/dist_train.sh? I noticed that the default prune_criterion is act_scale:parser.add_argument('--prune_criterion', type=str, default='act_scale', choices=['l1-norm', 'act_scale'])
. The entire pruning process is as follows: Firstly, for a well-trained model, the convolutional kernel weights are used to select which layers are planned to be pruned based on their L1 norm. Then, sparse training is performed by incorporating the scaling factors, targeting the scaling factors corresponding to unimportant layers. Finally, pruning is executed to remove those identified layers. I'm not sure if my understanding is accurate.