Alpha-VL / ConvMAE

ConvMAE: Masked Convolution Meets Masked Autoencoders
MIT License
484 stars 41 forks source link

have you try pure convolution network? #3

Closed Dongshengjiang closed 2 years ago

Dongshengjiang commented 2 years ago

have you try pure convolution network? does this work?

gaopengpjlab commented 2 years ago

Thanks for your interest in ConvMAE.

  1. Pure convolution leads to pretraining/finetuning inconsistency (no MASK token during fine-tuning) which will slightly decrease accuracy on CLS/DET/SEG.
  2. Masked convolution is equivalent to performing local aggregation (conv) on visible tokens. An optimal implementation can save up 75% FLOPs during pretrainig.
Dongshengjiang commented 2 years ago

Thanks for your repid reply. Is the no mask token MIM pretrain better than contrasive learning for convolution network?

gaopengpjlab commented 2 years ago

Your question is (pure convolution vs hybrid convolution / transformer vs transformer) or (pure convolution vs masked convolution)?

Dongshengjiang commented 2 years ago

I mean that does the performance of mae for convolution network( such as resnet50, convnext) is better than traditional contrasive learning methods(such as dino, byol).

gaopengpjlab commented 2 years ago

Evaluation pure convolution network with different pretraining paradigm such as MAE, DINO, BYOL is beyond the scope of this paper.