facebookresearch / simsiam

PyTorch implementation of SimSiam https//arxiv.org/abs/2011.10566
Other
1.15k stars 176 forks source link

General question regarding weight sharing and stop gradient #27

Closed ramchandracheke closed 2 years ago

ramchandracheke commented 2 years ago

Hi,

First of all thank you very much for this paper and code. I have been reading about Siamese's network. I have one general question regarding weight sharing and stop gradient. It would be great you could help me.

Question: Suppose, we start training by initialising network with He-normal weights. Thus, we have two branches with ResNet-50 encoder and their weights are same. For the first forward pass, the loss is calculated and then this error is propagated thru only one branch which contains prediction layer. Therefore there is weight updates in one branch but other branch also shared weights so it will also get same weights right?

Again a small update. Please correct me understanding is that, As we have calculated the loss and the loss value should back propagate to both branches in exactly same way thus you have introduced stop gradient method. With this methods, it is ensure that at the start of the second iterations weights in the both branches are same.

This may be a very stupid question. However, It would be great if you can help me. Thanks

endernewton commented 2 years ago

Note sure what you mean, but note that each image is randomly augmented to different views before feeding into the network. Otherwise SimSiam (actually any Siamese network based method) will not learn because it does not need learning.

ramchandracheke commented 2 years ago

Thank you for your reply.

Suppose, we start training by initialising network with He-normal weights. Thus, we have two branches with ResNet-50 encoder and their weights are same. For the first forward pass, we feed different views to the network and the loss is calculated based on cosine similarity.

Question: Once this error is propagated thru only one branch which contains prediction layer. Therefore there is weight updates in one branch but other branch also shared weights so it will also get same weights right?

Thanks