LeapLabTHU / MLLA

Official repository of MLLA (NeurIPS 2024)
188 stars 6 forks source link

Why setting `hid_exp_ratio=4` in stem and downsample layer? #27

Open Journey7331 opened 2 weeks ago

Journey7331 commented 2 weeks ago

Congrats for your excellent work!

As shown in your code, ratio=4 in stem and downsample layer, Is this setting for aligning with mlp_ratio in MLLABlock? Or any tricks here?

class Stem(nn.Module):
    ...
    self.conv3 = nn.Sequential(
        ConvLayer(embed_dim // 2, embed_dim * 4, kernel_size=3, stride=2, padding=1, bias=False),
        ConvLayer(embed_dim * 4, embed_dim, kernel_size=1, bias=False, act_func=None)
    )
class PatchMerging(nn.Module):
    def __init__(self, input_resolution, dim, ratio=4.0):
        super().__init__()
        self.input_resolution = input_resolution
        self.dim = dim
        in_channels = dim
        out_channels = 2 * dim
        self.conv = nn.Sequential(
            ConvLayer(in_channels, int(out_channels * ratio), kernel_size=1, norm=None),
            ConvLayer(int(out_channels * ratio), int(out_channels * ratio), kernel_size=3, stride=2, padding=1, groups=int(out_channels * ratio), norm=None),
            ConvLayer(int(out_channels * ratio), out_channels, kernel_size=1, act_func=None)
        )
tian-qing001 commented 1 week ago

Hi @Journey7331, thanks for your question. We set ratio=4 in the stem and downsample layers to align with the mlp_ratio. Adjusting ratio may lead to better performance, but it is not the focus of our paper. Therefore, we set ratio=4 as default.