Hi, I'm interested in implementing VIT small (image_size = 256) with MAE in order to compare the model with different SSL methods. Could anyone confirm if my implementation is correct?
def mae_vit_small_patch16_dec384d8b(**kwargs):
model = MaskedAutoencoderViT(
img_size=256, patch_size=16, embed_dim=384, depth=12, num_heads=6,
decoder_embed_dim=192, decoder_depth=8, decoder_num_heads=16,
mlp_ratio=4, norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)
return model
Also, I noticed when starting training, lr is 0.00000 as shown below. Do I implement it correctly?
Hi, I'm interested in implementing VIT small (image_size = 256) with MAE in order to compare the model with different SSL methods. Could anyone confirm if my implementation is correct?
Also, I noticed when starting training, lr is 0.00000 as shown below. Do I implement it correctly?