MASILab / 3DUX-Net

238 stars 33 forks source link

similarities between the weighted sum approach in self-attention and the convolution per-channel basis. #49

Open AN-AN-369 opened 11 months ago

AN-AN-369 commented 11 months ago

Your ideas are great! But I have a question, when introducing this part of "Volumetric Depth-wise Convolution with LKs:", put forward" Inspired by the idea of depth-wise convolution, we have found similarities between the weighted sum approach in self-attention and the convolution per-channel basis.", I did not find a clear explanation in the article, may I ask how to understand this sentence?

leeh43 commented 10 months ago

Thank you for your interest to our work. Great question! For swin transformer approach, the computation of self-attention within a window is completely similar to the convolution per-channel basis. For example, you have a window of 7x7 with swin transformer, it will further divided into sub-windows to compute the self-attention for more fine-grain details. However, they will be weighted-sum together and similar to the depthwise convolution approach (performing convolution each channel independely). That's the similarity between the swin transformer block and convolution block