Visual-Attention-Network / SegNeXt

Official Pytorch implementations for "SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation" (NeurIPS 2022)
Apache License 2.0
795 stars 85 forks source link

Why there is no activation function in attention module? #26

Closed RicoSuaveGuapo closed 2 years ago

RicoSuaveGuapo commented 2 years ago

Thanks for your excellent work, I have a quick question about the model structure.

In https://github.com/Visual-Attention-Network/SegNeXt/blob/b53d60110e5d0f87d3e5420473a386196d519fc4/mmseg/models/backbones/mscan.py#L59-L91

we can see that there are conv, element-wise plus and product. However, there is no activation function along with these operations. In the other words, without non-linear activation, these ops can be reduced into a single matrix ops.

I understand that there is SpatialAttention module has GELU which warps attnetion module, therefore non-linearity can be provided by it.

https://github.com/Visual-Attention-Network/SegNeXt/blob/b53d60110e5d0f87d3e5420473a386196d519fc4/mmseg/models/backbones/mscan.py#L94-L101

But I cannot figure out the reason of only using linear ops inside attention. Is there any good reason about this, or I am just simply missing sth in here.

MenghaoGuo commented 2 years ago

Good question.

I have asked myself same question and tried to merge them into a 21 x 21 matrix. However, i can not merge them into a 21 x 1 and a 1 x 21 matrix.

Actually, merging them into a 21 x 21 matrix is more expensive than current version.