IntelLabs / distiller

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Apache License 2.0
4.35k stars 800 forks source link

Pruner without thinning can not detect successor BN or Conv layers? #174

Closed bezorro closed 5 years ago

bezorro commented 5 years ago

Hi, thanks for providing this great DNN compression frame work. I am pruning MobileNetV1

class Block(nn.Module):
    def __init__(self, in_planes, out_planes, stride=1):
        super(Block, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, in_planes, kernel_size=3, stride=stride, padding=1, groups=in_planes, bias=False)
        self.bn1 = nn.BatchNorm2d(in_planes)
        self.conv2 = nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn2 = nn.BatchNorm2d(out_planes)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        return out

With the YAML file mobilenet.schedule.yaml

version: 1
pruners:
    pruner_base_5:
    class: 'L1RankedStructureParameterPruner'
    group_type: Filters
    desired_sparsity: 0.8
    weights: [
              layers.3.conv2.weight,
              ]
policies:
  - pruner:
      instance_name : pruner_base_5
    epochs: [1]

I load a baseline model and train for several epochs and save checkpoint. Then I load the saved checkpoint to see what is in it. I find that the pruned layers.3.conv2.weight is pruned accordingly, only 0.2 of the filters are nonzero. But in the successor BN and Conv layers(layers.3.bn2.weight, layers.4.conv1.weight) are not pruned. All of the channels are nonzero. I 've read the codes about pruning. It seems that pruners can not detect successor BN or Conv layers and adjust their parameters accordingly before thinning. Is that means if I do not perform pruning, filter pruning cannot be performed correctly?

nzmora commented 5 years ago

Hi @bezorro,

You described the pruning behavior correctly: pruners create "direct" sparsity, but they don't actually change the network structure. Thinning is a process that can follow pruning, to perform "neural garbage collection" and physically remove structures and parameters from the network - based on the sparsity it sees in the network and the data-dependencies. I describe this a bit here and also in issue #73. This is the design choice, not a bug, but your question makes me wonder if zeroing dependent data (e.g. successor BN and Conv layers), w/o removing them, would help in pruning. Maybe it's worth trying.

Here's an example of adding an explicit "thinning" step (source):

extensions:
  net_thinner:
      class: 'FilterRemover'
      thinning_func_str: remove_filters
      arch: 'mobilenet'
      dataset: 'imagenet'

policies:
    # After completeing the pruning, we perform network thinning and continue fine-tuning.
  - extension:
      instance_name: net_thinner
    epochs: [2]

Cheers, Neta

bezorro commented 5 years ago

Hi @bezorro,

You described the pruning behavior correctly: pruners create "direct" sparsity, but they don't actually change the network structure. Thinning is a process that can follow pruning, to perform "neural garbage collection" and physically remove structures and parameters from the network - based on the sparsity it sees in the network and the data-dependencies. I describe this a bit here and also in issue #73. This is the design choice, not a bug, but your question makes me wonder if zeroing dependent data (e.g. successor BN and Conv layers), w/o removing them, would help in pruning. Maybe it's worth trying.

Here's an example of adding an explicit "thinning" step (source):

extensions:
  net_thinner:
      class: 'FilterRemover'
      thinning_func_str: remove_filters
      arch: 'mobilenet'
      dataset: 'imagenet'

policies:
    # After completeing the pruning, we perform network thinning and continue fine-tuning.
  - extension:
      instance_name: net_thinner
    epochs: [2]

Cheers, Neta

Thanks for answering. It helps me a lot. But as pruners can not mask successor BN and Conv layers. So now 2 problems occur.

  1. The sensitivity analysis in this repo only apply mask to the current layer ignoring the successor BN and Conv. Thus, the sensitivity analysis results may be wrong.
  2. In your example of "thinning" step(source), thinning is applied in epoch[212]. But 'fc pruner' and 'fine_pruner' end in epoch[230]. So the after epoch[212], the pruner is meaningless?
nzmora commented 5 years ago

Hi @bezorro,

Sorry for the late reply - I didn't see your reply.

You are correct about (1). I don't like sensitivity analysis that much because it treats the weights/filters as i.i.d. (i.e. SA ignores the inter-dependencies between layers) so after I wrote the "thinning" feature I didn't go back to update the SA code. But the concern you raise is valid, and especially in networks that have non-serial data-dependencies - where certain layers have inputs that are dependent on more than one layer (e.g. ResNet, DenseNet, etc). In such cases, if you remove a filter of a layer, you may need to change more than one dependent BN and Conv layers (e.g. in ResNet there are some long dependency chains that include 7-8 dependent convolutions). If this is not clear, I can try to send you a diagram.

Regarding (2): this is indeed looks like bug (but it is not :-) - good catch nonetheless!
It's nuanced, so I will explain in-depth:

We do however, perform implicit thinning of FC layers. What do I mean by that? Look at the difference between this and this:

I hope this helped, Neta

bezorro commented 5 years ago

Hi @nzmora , I read and tested your codes for thinning and fully understand what you said. Thanks for your reply! It helps me a lot.