Why freeze the parameters of conv1 in ViT?

Sense-GVT / DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

622 stars 31 forks source link

Why freeze the parameters of conv1 in ViT? #9

Closed Yuting-Gao closed 2 years ago

zlccccc commented 2 years ago

As described in MoCoV3 [https://arxiv.org/abs/2104.02057],
random patch projection (\ie, freezing the parameters of conv1 in ViT) stabilizes training with smoother and better training curves, which also works in our framework. However, though He \etal. argues that the stability benefits the final accuracy, there is no significant gain in our previous experiments.