Alpha-VL / ConvMAE

ConvMAE: Masked Convolution Meets Masked Autoencoders
MIT License
477 stars 41 forks source link

refactor hard coded numbers for more control over parameters (MaskedAutoencoderConvViT) #25

Open DanTaranis opened 1 year ago

DanTaranis commented 1 year ago

Hi - I'd like to do patches of size 32x32, and a smaller model in general. any thing I change breaks the entire code. It would be really helpful if you refactored out all of the places that specify 4,2,16...etc throughout the code for MaskedAutoencoderConvViT

Thanks, Dan

gaopengpjlab commented 1 year ago

Sorry for the troubling. Please refer to the following code for ViT-16.

img_size=[224, 56, 28] feat_size=[56, 28, 14] rel_scale1 = int(feat_size[0] / feat_size[2]) rel_scale2 = int(feat_size[1] / feat_size[2]) mask_for_patch1 = mask.reshape(-1, feat_size[-1], feat_size[-1]).unsqueeze(-1).repeat(1, 1, 1, rel_scale1 ** 2).reshape(-1, feat_size[-1], feat_size[-1], rel_scale1, rel_scale1).permute(0, 1, 3, 2, 4).reshape(x.shape[0], feat_size[0], feat_size[0]).unsqueeze(1)

You need to modify the stride for self.stage1_output_decode / self.stage2_output_decode