facebookresearch / ConvNeXt-V2

Code release for ConvNeXt V2 model
Other
1.48k stars 117 forks source link

On masking input images #56

Open xiaohao-lin1 opened 1 year ago

xiaohao-lin1 commented 1 year ago

Dear Author,

In paper, you mentioned that masking is done on the raw images. However, in put code, masking is only done after the stem layer. Can you explain the inconsistency? Thank you!

hughsando commented 1 year ago

I think it is because the output of the stem layer is aligned to the patch boundaries so the two are equivalent, and this way allows the mask layers to be at lower resolution. You could potentially save some compute by not calculating the stem of masked-out blocks but maybe the overhead prevents this from being worthwhile.