Why there is no activation function in attention module?

Thanks for your excellent work, I have a quick question about the model structure.

In https://github.com/Visual-Attention-Network/SegNeXt/blob/b53d60110e5d0f87d3e5420473a386196d519fc4/mmseg/models/backbones/mscan.py#L59-L91

we can see that there are conv, element-wise plus and product. However, there is no activation function along with these operations. In the other words, without non-linear activation, these ops can be reduced into a single matrix ops.

I understand that there is SpatialAttention module has GELU which warps attnetion module, therefore non-linearity can be provided by it.

https://github.com/Visual-Attention-Network/SegNeXt/blob/b53d60110e5d0f87d3e5420473a386196d519fc4/mmseg/models/backbones/mscan.py#L94-L101

But I cannot figure out the reason of only using linear ops inside attention. Is there any good reason about this, or I am just simply missing sth in here.

Visual-Attention-Network / SegNeXt

Why there is no activation function in attention module? #26