I noticed that in the code.In the MSCASpatialAttention,There is the following code
def forward(self, x):
"""Forward function."""
shorcut = x.clone()
x = self.proj_1(x)
x = self.activation(x)
x = self.spatial_gating_unit(x)
x = self.proj_2(x)
x = x + shorcut
return x
as we can see,when the X go through the 1x1 convolution、activation(GELU)、MSCAAttention and another 1x1 convolution,X add its shortcut!
But in the legend of the Attention( corresponding to the MSCASpatialAttention part in the code),it doesn‘t draw this operation similar to residual linking
So I would like to ask, is the operation here not shown in the diagram
in addition,the expansion ration of FFN = [8,8,4,4] in the paper,
but in the code,the mlp_ratio = [4,4,4,4] in the class MSCAN
can you explain it please?
I noticed that in the code.In the MSCASpatialAttention,There is the following code def forward(self, x): """Forward function."""
as we can see,when the X go through the 1x1 convolution、activation(GELU)、MSCAAttention and another 1x1 convolution,X add its shortcut!
But in the legend of the Attention( corresponding to the MSCASpatialAttention part in the code),it doesn‘t draw this operation similar to residual linking
So I would like to ask, is the operation here not shown in the diagram