Some concerns of approach to prune network.

ShichenLiu / CondenseNet

CondenseNet: Light weighted CNN for mobile devices

MIT License

694 stars 131 forks source link

Hi @HaiPhan1991 ,

About condensation criterion, i::self.group means starting from i and stepping by self.group, instead of from i to self.group. Learned Group Convolution implicitly employs a channel shuffle layer (Fig 1 in our paper, permute layer), so the first dimension to be pruned is not continuous. Shuffle weight in group lasso loss is for the same reason.

Larger kernels are more complicated, e.g. its bias might be considered as part of its importance to be pruned. As far as I am concerned, naively prune weights by its absolute sum may lead to decrease in efficiency.

If you carefully examine the Fig 8 in our paper, you might find out that most of the weights in classifier layer are extremely low. This is also the case in ImageNet, where there are much more classifier parameters (about 2M). By pruning them, we could save huge amount of parameters without loss in accuracy.

Best, Shichen

ShichenLiu / CondenseNet

Some concerns of approach to prune network. #6