Closed chaddy1004 closed 2 years ago
Hi @chaddy1004,
Yes that's correct. We always take the target views and supports to be the output of the projection head
For closer comparison with other self-supervised methods (e.g., BYOL & SimSiam), we also provide the option to use an additional prediction head on top to process the anchor views and supports. This can just be set with the use_pred
flag.
However, if you set use the use_pred
flag to False
, then both z and h come from the same layer as you indicated
We do keep the prediction head in our default setup because it seems to make training more robust to the choice of learning rate, but as you can see in Table 5 in the paper, we get better performance by not using a prediction head. We also do all of our CIFAR10 experiments without a prediction head.
@chaddy1004 does that clarify your question? If so i'll close the issue. If not just let me know and happy to answer any follow-up questions!
Oh yes!!! I completely forgot to reply! Thank you so much for your clarification I really appreciate it! I will ask more questions when I encounter more haha. Thank you!
Dear all,
I had a question about the code.
https://github.com/facebookresearch/suncet/blob/731547d727b8c94d06c08a7848b4955de3a70cea/src/paws_train.py#L313
In this line, the authors get the target views and target supports from the h vector, but they get anchor views and anchor supports from the z vector few lines above.
I believe that h is the vector that is before the pred head, and z is the vector after pred head, but in the paper, it seems like z for anchor and z for positive view (target) has to come from the same layer.
Can anyone explain why the code is written in this way?
Thank you~!