There should be a stop gradient in the simsiam model

PatrickHua / SimSiam

A pytorch implementation for paper 'Exploring Simple Siamese Representation Learning'

MIT License

809 stars 135 forks source link

There should be a stop gradient in the simsiam model #39

Open nikheelpandey opened 3 years ago

nikheelpandey commented 3 years ago

Hello,

I was using your implementation of SimSiam for contrastive learning. I noticed that the model that you have created has a few problems:

The "stop_gradient" part of the network is absent from your implementation. This model is effectively training both the path.

Could you please clarify how and where you are taking care of it?

hhhdw commented 3 years ago

def D(p, z, version='simplified'): # negative cosine similarity if version == 'original': z = z.detach() # stop gradient p = F.normalize(p, dim=1) # l2-normalize z = F.normalize(z, dim=1) # l2-normalize return -(p*z).sum(dim=1).mean()

elif version == 'simplified':# same thing, much faster. Scroll down, speed test in __main__
    return - F.cosine_similarity(p, z.detach(), dim=-1).mean()
else:
    raise Exception

There is a 'detach' after 'z' when compute loss