facebookresearch / mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
6.93k stars 1.17k forks source link

A question about DropPath in pretraining #166

Open YangSun22 opened 1 year ago

YangSun22 commented 1 year ago

I found that DropPath is set to 0 in the pre-training and finetuning is set to 0.1. this does not match the way Dropout is used. It is supposed to prevent the occurrence of overfitting. But why is it not used in the pre-training?

class Block(nn.Module):

    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
                 drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):
        self.norm1 = norm_layer(dim)
        self.attn = Attention(
            dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)
        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
        self.norm2 = norm_layer(dim)
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

    def forward(self, x):
        x = x + self.drop_path(self.attn(self.norm1(x)))
        x = x + self.drop_path(self.mlp(self.norm2(x)))
        return x
alexlioralexli commented 6 months ago

It could be that the reconstruction task is very hard, so there's no overfitting during pretraining.