Alpha-VL / ConvMAE

ConvMAE: Masked Convolution Meets Masked Autoencoders
MIT License
477 stars 41 forks source link

mask convolution #7

Closed cathylao closed 2 years ago

cathylao commented 2 years ago

Hi! Thanks for the opensource code. I noticed that the mask convolution in the code only masks the residual block, but the skip connection does not have a mask, as shown in line 119 of "ConvMAE/vision_transformer.py". The corresponding code is as follows: "x = x + self.drop_path(self.conv2(self.attn(mask * self.conv1(self.norm1(x.permute(0, 2, 3, 1)).permute(0, 3, 1, 2))))) " Will this lead to information leakage in convolution stage?

gaopengpjlab commented 2 years ago

Thanks for your interest. There are two ways to verify information leakage or not.

  1. Please check the reconstruction loss. It can empirically prove there is no information leakage happen inside ConvMAE. https://drive.google.com/file/d/1Je9ClIGCQP43xC3YURVFPnaMRC0-ax1h/view?usp=sharing
  2. Inside stage 1 and stage 2, spatial information aggregation happen at depthwise convolution operation. Masking on DW can prevent information leakage. Adding mask operations on other operator is redundant.
ding3820 commented 2 years ago

Hi @gaopengpjlab and @Alpha-VL ,

I just find out this thread discussing the issue I was curious to know. I can understand that the local attention only works on dw-conv but there is still an FFN after it. It is possible that FFN mixes some information from the skip connection branch and leads to the leakage. For the extreme case, the possibility of network learning to pass the original image throughout the all model by skip connection is still held. Let me know if I misunderstand anything. Thanks!

aichifandefan commented 1 year ago

Yeach, I have the same doubt.