Lightning-Universe / lightning-bolts

Toolbox of models, callbacks, and datasets for AI/ML researchers.
https://lightning-bolts.readthedocs.io
Apache License 2.0
1.68k stars 320 forks source link

MoCo momentum update doesn't update head parameters #1059

Open SebastianGer opened 1 year ago

SebastianGer commented 1 year ago

🐛 Bug

In the MoCo implementation, we need to add a head to get the MoCo v2 version of the method. However, in the momentum update, only the encoder's parameters are updated, not the head. For example in version 0.5.0, this was not an issue, since the head replaced encoder.fc. That way, when performing the momentum update of all encoder values, the head was also updated. In its current form, the head is an attribute of MoCo, but not of the encoder.

The original MoCo implementation, on which the bolts implementation is based, also replaces encoder.fc: https://github.com/facebookresearch/moco/blob/main/moco/builder.py

The issue could be resolved by having the head replace the fc layer again, or by explicitly momentum-updating the head.

To Reproduce

I have not run any experiments to confirm this.

Code sample

The heads are set here: https://github.com/Lightning-Universe/lightning-bolts/blob/master/src/pl_bolts/models/self_supervised/moco/moco_module.py#L165

But not updated here: https://github.com/Lightning-Universe/lightning-bolts/blob/master/src/pl_bolts/models/self_supervised/moco/moco_module.py#L261

Expected behavior

The momentum update should update all parameters, including those belonging to the head.

Environment

Installed via pip:

lightning==2.0.5 lightning-bolts @ git+https://github.com/PytorchLightning/lightning-bolts.git@4f910f6c9893eaf206eed6b7df545c8a02043a55 lightning-cloud==0.5.37 lightning-utilities==0.9.0 pytorch-lightning==1.9.5

Additional context

A reason for why this problem occurred might be that MoCo uses the torchvision resnet implementation, while e.g. SimSiam uses the pl_bolts implementation. In torchvision, the fc layer is actually used, while in pl_bolts, it seems like the fc layer is unused. When assuming the pl_bolts version, it therefore makes sense to add the head separately, since replacing the fc layer would do nothing.

Borda commented 1 year ago

Thank you for this finding @SebastianGer, would you be interested in investigating how to fix it and send a PR? :rabbit: