facebookresearch / suncet

Code to reproduce the results in the FAIR research papers "Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples" https://arxiv.org/abs/2104.13963 and "Supervision Accelerates Pre-training in Contrastive Semi-Supervised Learning of Visual Representations" https://arxiv.org/abs/2006.10803
MIT License
488 stars 67 forks source link

[Question] Using h as the target vector #33

Closed chaddy1004 closed 2 years ago

chaddy1004 commented 2 years ago

Dear all,

I had a question about the code.

https://github.com/facebookresearch/suncet/blob/731547d727b8c94d06c08a7848b4955de3a70cea/src/paws_train.py#L313

In this line, the authors get the target views and target supports from the h vector, but they get anchor views and anchor supports from the z vector few lines above.

I believe that h is the vector that is before the pred head, and z is the vector after pred head, but in the paper, it seems like z for anchor and z for positive view (target) has to come from the same layer.

Can anyone explain why the code is written in this way?

Thank you~!

MidoAssran commented 2 years ago

Hi @chaddy1004,

Yes that's correct. We always take the target views and supports to be the output of the projection head

https://github.com/facebookresearch/suncet/blob/731547d727b8c94d06c08a7848b4955de3a70cea/src/paws_train.py#L430-L439

For closer comparison with other self-supervised methods (e.g., BYOL & SimSiam), we also provide the option to use an additional prediction head on top to process the anchor views and supports. This can just be set with the use_pred flag.

https://github.com/facebookresearch/suncet/blob/731547d727b8c94d06c08a7848b4955de3a70cea/src/paws_train.py#L441-L451

However, if you set use the use_pred flag to False, then both z and h come from the same layer as you indicated

https://github.com/facebookresearch/suncet/blob/731547d727b8c94d06c08a7848b4955de3a70cea/src/paws_train.py#L291-L297

We do keep the prediction head in our default setup because it seems to make training more robust to the choice of learning rate, but as you can see in Table 5 in the paper, we get better performance by not using a prediction head. We also do all of our CIFAR10 experiments without a prediction head.

MidoAssran commented 2 years ago

@chaddy1004 does that clarify your question? If so i'll close the issue. If not just let me know and happy to answer any follow-up questions!

chaddy1004 commented 2 years ago

Oh yes!!! I completely forgot to reply! Thank you so much for your clarification I really appreciate it! I will ask more questions when I encounter more haha. Thank you!