LTH14 / rcg

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
MIT License
775 stars 35 forks source link

why the projector head MLP set to requires_grad=False #23

Open woshixiaobai2019 opened 6 months ago

woshixiaobai2019 commented 6 months ago

I noticed that a new projector head MLP is added after loading the pre-trained MoCo v3 model. However, the parameters of this newly added component are also set to requires_grad=False.

My question is - since this MLP head is randomly initialized, why does it not require any training before being used for feature projection?

Intuitively, adding an untrained random projection head could disrupt the original feature distributions learned by the pre-trained encoder. So what is the motivation behind fixing the parameters of this newly added head?

Does it relate to better retaining the pre-trained feature distributions? Or leveraging the fixed random projections to improve generalization of the downstream tasks?

It will be great if someone could help explain the rationale behind not training the added projector head. Thanks!

ys-koshelev commented 1 month ago

I have the same question, @LTH14 could you please clarify how do you train the newly added projection head of the MoCoV3 model?

LTH14 commented 1 month ago

The head in moco_vits is inherited from timm's VisionTransformer, which is a single Linear layer. However, the projection head of the pre-trained moco-v3 is an MLP (module.base_encoder.head). I don't want to change the original MoCo code and re-train that model, so instead I replace the Linear head with an MLP head so that the pre-trained weights can be loaded.

LTH14 commented 1 month ago

Note that the initialization of the pre-trained encoder (and MLP) is before loading the pre-trained weights, as shown here https://github.com/LTH14/rcg/blob/main/pixel_generator/mage/models_mage.py#L263-L278