ShichenLiu / CondenseNet

CondenseNet: Light weighted CNN for mobile devices
MIT License
694 stars 131 forks source link

Some concerns of approach to prune network. #6

Closed haithanhp closed 6 years ago

haithanhp commented 6 years ago

Hi @gaohuang and @ShichenLiu ,

Thank you for great work. I have the following concerns when I try running your code and read your paper:

Thanks, Hai

ShichenLiu commented 6 years ago

Hi @HaiPhan1991 ,

About condensation criterion, i::self.group means starting from i and stepping by self.group, instead of from i to self.group. Learned Group Convolution implicitly employs a channel shuffle layer (Fig 1 in our paper, permute layer), so the first dimension to be pruned is not continuous. Shuffle weight in group lasso loss is for the same reason.

Larger kernels are more complicated, e.g. its bias might be considered as part of its importance to be pruned. As far as I am concerned, naively prune weights by its absolute sum may lead to decrease in efficiency.

If you carefully examine the Fig 8 in our paper, you might find out that most of the weights in classifier layer are extremely low. This is also the case in ImageNet, where there are much more classifier parameters (about 2M). By pruning them, we could save huge amount of parameters without loss in accuracy.

Best, Shichen