refactor hard coded numbers for more control over parameters (MaskedAutoencoderConvViT)

Alpha-VL / ConvMAE

ConvMAE: Masked Convolution Meets Masked Autoencoders

MIT License

483 stars 41 forks source link

Sorry for the troubling. Please refer to the following code for ViT-16.

img_size=[224, 56, 28] feat_size=[56, 28, 14] rel_scale1 = int(feat_size[0] / feat_size[2]) rel_scale2 = int(feat_size[1] / feat_size[2]) mask_for_patch1 = mask.reshape(-1, feat_size[-1], feat_size[-1]).unsqueeze(-1).repeat(1, 1, 1, rel_scale1 ** 2).reshape(-1, feat_size[-1], feat_size[-1], rel_scale1, rel_scale1).permute(0, 1, 3, 2, 4).reshape(x.shape[0], feat_size[0], feat_size[0]).unsqueeze(1)

You need to modify the stride for self.stage1_output_decode / self.stage2_output_decode

Alpha-VL / ConvMAE

refactor hard coded numbers for more control over parameters (MaskedAutoencoderConvViT) #25