Closed hiyijian closed 6 years ago
Hi @hiyijian ,
Actually we implicitly include it in LearnedGroupConv
for training speed consideration, so it is not a mismatch. Specifically, here we drop weights from shuffled convolution weights, which is equivalent to an appended shuffle layer. The shuffle layer is explicitly appended to CondenseConv
here when converted.
Thanks. So shuffle layer is implicitly appended after the first stage. During the first stage, we dont need such a shuffle layer?
@hiyijian Yes. Only in testing stage do we need shuffle layers.
@ShichenLiu , I am still confused about this question. Isn't the learned feature map has the learned mapping with the 1x1-conv-filter here? Why is shuffle layer is applied here?
Dear @ShichenLiu , I did not found any shuffle layer related stuff in models.condensenet, which use layers.LearnedGroupConv as LGC. However, the paper says we should use it clearly. Is it a mismatch? Thanks