Closed RicoSuaveGuapo closed 2 years ago
Good question.
I have asked myself same question and tried to merge them into a 21 x 21 matrix. However, i can not merge them into a 21 x 1 and a 1 x 21 matrix.
Actually, merging them into a 21 x 21 matrix is more expensive than current version.
Thanks for your excellent work, I have a quick question about the model structure.
In https://github.com/Visual-Attention-Network/SegNeXt/blob/b53d60110e5d0f87d3e5420473a386196d519fc4/mmseg/models/backbones/mscan.py#L59-L91
we can see that there are conv, element-wise plus and product. However, there is no activation function along with these operations. In the other words, without non-linear activation, these ops can be reduced into a single matrix ops.
I understand that there is SpatialAttention module has GELU which warps attnetion module, therefore non-linearity can be provided by it.
https://github.com/Visual-Attention-Network/SegNeXt/blob/b53d60110e5d0f87d3e5420473a386196d519fc4/mmseg/models/backbones/mscan.py#L94-L101
But I cannot figure out the reason of only using linear ops inside attention. Is there any good reason about this, or I am just simply missing sth in here.