5g4s / paper

0 stars 0 forks source link

SimMIM: a Simple Framework for Masked Image Modeling #26

Open 5g4s opened 1 year ago

5g4s commented 1 year ago

https://openaccess.thecvf.com/content/CVPR2022/papers/Xie_SimMIM_A_Simple_Framework_for_Masked_Image_Modeling_CVPR_2022_paper.pdf

5g4s commented 1 year ago

1) Random masking of the input image with a moderately large masked patch size (e.g., 32) makes a powerful pre-text task. 2) Predicting RGB values of raw pixels by direct regression performs no worse than the patch classification approaches with complex designs. 3) The prediction head can be as light as a linear layer, with no worse performance than heavier ones.

image