BR-IDL / PaddleViT

:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
https://github.com/BR-IDL/PaddleViT
Apache License 2.0
1.21k stars 317 forks source link

TopFormer implementation differs from original reference implementation #231

Open dominikandreas opened 1 year ago

dominikandreas commented 1 year ago

sig_act is computed differently from the original implementation. Complare https://github.com/BR-IDL/PaddleViT/blob/55b33c3d11c16f7fe5069cbd85962a68c4867ded/semantic_segmentation/src/models/backbones/top_transformer.py#L330-L334

with

https://github.com/hustvl/TopFormer/blob/2dc253c49ef78742ca6b44e550c5fea63a274288/mmseg/models/backbones/topformer.py#L328

I assume this is not intentional. The fix is straightforward:

    def forward(self, x_local, x_global):
        '''
        x_g: global features
        x_l: local features
        '''
        B, C, H, W = x_local.shape
        local_feat = self.local_embedding(x_local)

        global_act = self.global_act(x_global)
        sig_act = F.interpolate(self.act(global_act), size=(H, W), mode='bilinear', align_corners=False)

        global_feat = self.global_embedding(x_global)
        global_feat = F.interpolate(global_feat, size=(H, W), mode='bilinear', align_corners=False)

        out = local_feat * sig_act + global_feat
        return out