Hi @huangbiubiu , thanks for your interest in our work! Here are my suggestions:
Can you analyse the distribution of scaling factors in BN of your trained models? Like Figure 4 in the network slimming paper? Normally, network slimming would work by making the scaling factors of BN layers sparse.
Maybe you can do the pruning on the Convolution layers and FC layers separately. You can get a 50% pruned models in this way.
We trained the network slimming model with the command https://github.com/Eric-mingjie/rethinking-network-pruning/blob/master/imagenet/network-slimming/README.md#train-with-sparsity, and prune with 50%. However, we could not prune the same result as models you provided.
More specifically, in our result, the
classifier.1.weight
was pruned to 0 channels, and theclassifier.4.weight
almost keeps all original channels.Pruning result: