Closed bezorro closed 5 years ago
Hi @bezorro,
You described the pruning behavior correctly: pruners create "direct" sparsity, but they don't actually change the network structure. Thinning is a process that can follow pruning, to perform "neural garbage collection" and physically remove structures and parameters from the network - based on the sparsity it sees in the network and the data-dependencies. I describe this a bit here and also in issue #73. This is the design choice, not a bug, but your question makes me wonder if zeroing dependent data (e.g. successor BN and Conv layers), w/o removing them, would help in pruning. Maybe it's worth trying.
Here's an example of adding an explicit "thinning" step (source):
extensions:
net_thinner:
class: 'FilterRemover'
thinning_func_str: remove_filters
arch: 'mobilenet'
dataset: 'imagenet'
policies:
# After completeing the pruning, we perform network thinning and continue fine-tuning.
- extension:
instance_name: net_thinner
epochs: [2]
Cheers, Neta
Hi @bezorro,
You described the pruning behavior correctly: pruners create "direct" sparsity, but they don't actually change the network structure. Thinning is a process that can follow pruning, to perform "neural garbage collection" and physically remove structures and parameters from the network - based on the sparsity it sees in the network and the data-dependencies. I describe this a bit here and also in issue #73. This is the design choice, not a bug, but your question makes me wonder if zeroing dependent data (e.g. successor BN and Conv layers), w/o removing them, would help in pruning. Maybe it's worth trying.
Here's an example of adding an explicit "thinning" step (source):
extensions: net_thinner: class: 'FilterRemover' thinning_func_str: remove_filters arch: 'mobilenet' dataset: 'imagenet' policies: # After completeing the pruning, we perform network thinning and continue fine-tuning. - extension: instance_name: net_thinner epochs: [2]
Cheers, Neta
Thanks for answering. It helps me a lot. But as pruners can not mask successor BN and Conv layers. So now 2 problems occur.
Hi @bezorro,
Sorry for the late reply - I didn't see your reply.
You are correct about (1). I don't like sensitivity analysis that much because it treats the weights/filters as i.i.d. (i.e. SA ignores the inter-dependencies between layers) so after I wrote the "thinning" feature I didn't go back to update the SA code. But the concern you raise is valid, and especially in networks that have non-serial data-dependencies - where certain layers have inputs that are dependent on more than one layer (e.g. ResNet, DenseNet, etc). In such cases, if you remove a filter of a layer, you may need to change more than one dependent BN and Conv layers (e.g. in ResNet there are some long dependency chains that include 7-8 dependent convolutions). If this is not clear, I can try to send you a diagram.
Regarding (2): this is indeed looks like bug (but it is not :-) - good catch nonetheless!
It's nuanced, so I will explain in-depth:
fine_pruner
is a fine-pruner (i.e. element-wise pruner), so it is not affected by "thinning" (because the sizes of the filters/channels remain the same when doign element-wise pruning, we can't make the network physically smaller). fc_pruner
prunes rows. The "thinning" feature does not support thinning of FC/Linear layers explicitly (as in this case, where we explicitly prune rows). I didn't write the code to explicitly "thinnify" FC layers, because they are not significant in today's CNNs (however, FC/Linear layers are important in other types of DNNs). We do however, perform implicit thinning of FC layers. What do I mean by that? Look at the difference between this and this:
NNZ (dense)
= 640 and NNZ (sparse)
= 320. This means that the we have 640-320=320 zeros in the FC weights.NNZ (dense)
= 320 and NNZ (sparse)
= 160. This means that the we have 320-160=160 zeros in the FC weights. But NNZ (dense)
changed from 640 to 320. Why? Because we (physically) removed 50% of the rows (row thinning). But I said above that we don't explicitly thinnify rows, so what happened? Well, in the 2nd example low_pruner_2
removes 50% of the filters of module.layer3.2.conv2.weight
. Each of these filters corresponds to one row of the FC layers that follows (if it is not clear why, I can explain). So when we thinnify module.layer3.2.conv2.weight
, we also implicitly thinnify the rows of module.fc.weight
.I hope this helped, Neta
Hi @nzmora , I read and tested your codes for thinning and fully understand what you said. Thanks for your reply! It helps me a lot.
Hi, thanks for providing this great DNN compression frame work. I am pruning MobileNetV1
With the YAML file mobilenet.schedule.yaml
I load a baseline model and train for several epochs and save checkpoint. Then I load the saved checkpoint to see what is in it. I find that the pruned layers.3.conv2.weight is pruned accordingly, only 0.2 of the filters are nonzero. But in the successor BN and Conv layers(layers.3.bn2.weight, layers.4.conv1.weight) are not pruned. All of the channels are nonzero. I 've read the codes about pruning. It seems that pruners can not detect successor BN or Conv layers and adjust their parameters accordingly before thinning. Is that means if I do not perform pruning, filter pruning cannot be performed correctly?