Closed Burugi closed 5 months ago
@Burugi Thank you for the question (and apology for the late response).
Yepp, the key to the group convolution is that it divides input channel (C_in) and output channel (C_out) to g number of groups.
For each group, we conduct a convolution operation, where its # of computations is defined by
Since we have G number of different convolution, therefore the total # of the G-group convolution will be
So, your understanding of the group convolution is clear.
Also, It would be more accurate to write kernel_w*kernel_h than kernel^2 because, in some cases, we apply a non-square kernel window to design a (lightweight) convolutional neural network. For example, some networks approximate a 3x3 convolutional layer by stacking 1x3 and 3x1 convolution layers.
Thank you
Thank you for answering.
For professor's sensitive notice, I should accurately check the notation in mathematics prove.
Again, thank you for taking the time to respond on your day off.
Hyunwoong LIM
This issue is about group convolution getting a computation reduction. We would appreciate it if you could confirm that this math is correct and let us know if you have any corrections.
The idea of a group convolution is to divide the number of channels. However, one of the common misconceptions is that since you're dividing the channels, you're doing G many convolutions, so there's no computational gain.
The reason for this misconception is that you need to divide not only the channels, but also the channels in the kernel into groups, which is sometimes well illustrated with pictures, but is easy to miss.
Therefore, I have attached the math for this in the photo below.
Thank you.
Hyunwoong Lim