Closed lizhenstat closed 4 years ago
Hi,
During testing, we explicit shuffle the feature channels. However, during training, we implicitly shuffle the feature channels by choosing to drop certain kernels. These are equivalent mathematically.
@ShichenLiu Thanks for your reply, I still have the following two questions: (1) I understand the equivalence of shuffling feature map and shuffling kernels. However, I still don't understand why is the shuffle operation necessary here? (I understand why shuffleNet shuffle the output feature maps, since they need different kernels come from different groups to increase variety of the inputs). As for learned group convolution here? Each kernel already learn the corresponding "important" input feauture maps through training, why do we still need this operation here.
(2) Did you update mask in the following way the weight has been shuffled, however the corresponding mask has not? I am not sure whether I understand it right(I check the mask in different stages)
self._mask[i::self.groups, d, :, :].fill_(0)
Thanks a lot
@ShichenLiu
Hi, I have one question on function dropping in layers.py. I don't understand why learned group convolution still needs the shuffling operation?
https://github.com/ShichenLiu/CondenseNet/blob/master/layers.py#L78
I notice there is a shuffle operation mentioned in 4.1's first graph: "we permute the output channels of the first 1x1_conv learned group convolution layer, such that the features generated by each of its groups are evenly used by all the groups of the subsequent 3x3 group convolutional layer" However, this operation aims to shuffle feature maps, not convolutional kernels.
Can you explain a little bit? Thanks in advance