HVision-NKU / Conv2Former

MIT License
149 stars 12 forks source link

some questions... #3

Open lovekittynine opened 1 year ago

lovekittynine commented 1 year ago

Hello, i have some small questions about the code.

Thanks for your replay!

class MLP(nn.Module):
    def __init__(self, dim, mlp_ratio=4):
        super().__init__()

        self.norm = LayerNorm(dim, eps=1e-6, data_format="channels_first")

        self.fc1 = nn.Conv2d(dim, dim * mlp_ratio, 1)
        self.pos = nn.Conv2d(dim * mlp_ratio, dim * mlp_ratio, 3, padding=1, groups=dim * mlp_ratio)
        self.fc2 = nn.Conv2d(dim * mlp_ratio, dim, 1)
        self.act = nn.GELU()

    def forward(self, x):
        B, C, H, W = x.shape

        x = self.norm(x)
        x = self.fc1(x)
        x = self.act(x)
        x = x + self.act(self.pos(x))
        x = self.fc2(x)

        return 
houqb commented 1 year ago

Thanks for the questions.

1) We do miss the description on the use of 3x3 dwise conv in MLP and will update the paper. 2) You may refer to the paper termed RepLKNet for more explanations on this. In addition, this is benefitial to downstream tasks, which need higher-resolution images.

whiteinblue commented 1 year ago

Extra Question: you add self.layer_scale_1 and self.layer_scale_2 to ConvMod block, it also introduce extra parameters, what's the effective of the two scale params ???

houqb commented 1 year ago

If you use Hadamard product, the magnitude of the feature values tend to be larger than using addition. These parameters help the optimization process, which has been widely used in modern network architectures. You may refer to CaiT by Touvron et al. for more details.