Is the initial determination of which filters are unimportant based on the L1 norm of the weights?

zxd-cqu commented 1 year ago

Should --prune_criterion l1-norm be added in ./scripts/dist_train.sh? I noticed that the default prune_criterion is act_scale: parser.add_argument('--prune_criterion', type=str, default='act_scale', choices=['l1-norm', 'act_scale']). The entire pruning process is as follows: Firstly, for a well-trained model, the convolutional kernel weights are used to select which layers are planned to be pruned based on their L1 norm. Then, sparse training is performed by incorporating the scaling factors, targeting the scaling factors corresponding to unimportant layers. Finally, pruning is executed to remove those identified layers. I'm not sure if my understanding is accurate.

Zj-BinXia commented 1 year ago

Yes, more details can be found in the research paper and source code

zxd-cqu commented 1 year ago

However, it seems that directly modifying the code to use l1-norm as the criterion is not compatible because using the l1-norm of filters cannot prune the input to the first convolutional layer of residual blocks...

Zj-BinXia commented 1 year ago

I cannot get your idea. Actually, in the code, it is processed. You can take a look at the code to see it.

zxd-cqu commented 1 year ago

    for name, module in model.named_modules():  
        if name in layers:
            layer = layers[name]
            out = get_score_layer(name,module, wg=wg, criterion=criterion)
            score = out['score']
            layer.score = score
            layer.prescore = out['act_scale_pre']
            if raw_pr[name] > 0: # pr > 0 indicates we want to prune this layer so its score will be included in the <all_scores>
                all_scores = np.append(all_scores, score)
                if hasattr(module, 'act_scale_pre'):
                    all_scores = np.append(all_scores, out["act_scale_pre"])

In this code snippet, if the criterion is set to L1-norm, then the score represents the L1-norm of the filters, and it is added to the all_scores array. Additionally, all_scores = np.append(all_scores, out["act_scale_pre"]) includes the scaling factors as scores in all_scores as well. I think these are two different scores, and they can be combined for importance ranking.

Zj-BinXia commented 1 year ago

They essentially represent the same thing, emphasizing the importance of filters. The separate naming is simply for the convenience of code organization and readability. I still cannot get your issue.

zxd-cqu commented 1 year ago

I'm sorry, I just started working on pruning-related tasks, so there may be issues with my expression. What you mean is that the L1-norm calculated using the absolute average of filter weights and the act_scale_pre calculated using the scaling factors can be combined for importance ranking, correct?

zxd-cqu commented 1 year ago

all_scores = np.append(all_scores, score)
                if hasattr(module, 'act_scale_pre'):
                    all_scores = np.append(all_scores, out["act_scale_pre"])

The score can be set asL1-norm or act_scale. I understand that when it is set as act_scale, it can be used for importance ranking along with the act_scale_pre in the third line. However, when it is set as L1-norm, is it correct to perform importance ranking together with the act_scale_pre in the third line?

Zj-BinXia commented 1 year ago

No, act_scale_pre item is also calculated by L1-norm as act_scale, please see get_score_layer function

zxd-cqu commented 1 year ago

If the criterion = 'l1-norm' in the get_score_layer(name,module, wg='filter', criterion='l1-norm')function. Then at the the second-to-last line in get_score_layer function, out['score'] = out[criterion] would be out['score'] = out['l1-norm']. The value of out['l1-norm'] is calculated using l1 = module.weight.abs().view(-1, num_fea * scale * scale, 3, 3).mean(dim=[1, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=1). Therefore, out['score'] represents the score obtained from the weights of the filters.

So the outer function two lines : score = out['score'] and all_scores = np.append(all_scores, score, axis=0), means the absolute mean value of the filters weights is added to all_scores. On the other hand, all_scores = np.append(all_scores, out["act_scale_pre"]) adds the L1-norm related to the scaling factors. The difference lies in these two types of importance scores. Can these two different importance scores be combined for ranking?

def get_score_layer(name,module, wg='filter', criterion='l1-norm'):
    r"""Get importance score for a layer.

    Return:
        out (dict): A dict that has key 'score', whose value is a numpy array
    """
    # -- define any scoring scheme here as you like
    shape = module.weight.data.shape
    if "upconv" in name:
        if wg == "channel":
            l1 = module.weight.abs().mean(dim=[0, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=0)
        elif wg == "filter":
            scale=2
            num_fea=64
            l1 = module.weight.abs().view(-1,num_fea*scale*scale,3,3).mean(dim=[1, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=1)
        elif wg == "weight":
            l1 = module.weight.abs().flatten()
    else:
        if wg == "channel":
            l1 = module.weight.abs().mean(dim=[0, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=0)
        elif wg == "filter":
            l1 = module.weight.abs().mean(dim=[1, 2, 3]) if len(shape) == 4 else module.weight.abs().mean(dim=1)
        elif wg == "weight":
            l1 = module.weight.abs().flatten()
    # --

    out = {}
    out['l1-norm'] = tensor2array(l1)
    if "upconv" in name:
        out['act_scale'] = tensor2array(module.act_scale.abs().view(-1)) if hasattr(module, 'act_scale') else [1e30] * (module.weight.size(0)//4)
        if hasattr(module, 'act_scale_pre'):
            out['act_scale_pre'] = tensor2array(module.act_scale_pre.abs().view(-1))
        else:
            out['act_scale_pre'] = [1e30] * module.weight.size(1)
    else:
        out['act_scale'] = tensor2array( module.act_scale.abs().view(-1)) if hasattr(module, 'act_scale') else [1e30] * module.weight.size(0)
        if hasattr(module, 'act_scale_pre'):
            out['act_scale_pre'] = tensor2array(module.act_scale_pre.abs().view(-1))
        else:
            out['act_scale_pre'] = [1e30] * module.weight.size(1)
    # 1e30 to indicate this layer will not be pruned because of its unusually high scores
    out['score'] = out[criterion]
    return out

Zj-BinXia / SSL

Is the initial determination of which filters are unimportant based on the L1 norm of the weights? #5