facebookresearch / VICRegL

VICRegL official code base
Other
223 stars 24 forks source link

Classification loss #9

Closed ds2268 closed 1 year ago

ds2268 commented 1 year ago

@Adrien987k thank you for the work and the implementation.

I have noticed that you are also using classification loss, in addition to the VICReg and VICRegL losses. The classification loss is not mentioned in the paper and was also not used in the original VICReg.

On the first hand, it seems a bit off, given that you won't have labels available when pretraining in the wild and it destroys the purpose of "self-supervised" pre-training ?

ds2268 commented 1 year ago

I have prepared a pull request that should fix the issue (https://github.com/facebookresearch/VICRegL/pull/10). I think that this should not be added to the global loss.

ds2268 commented 1 year ago

I see that you have used detach. The learned representation was thus not affected by the gradients coming from the classification loss. Still, this is ugly and the actual loss value does thus not represent the actual VICRegL loss.

jinx2018 commented 1 year ago

@Adrien987k thank you for the work and the implementation.

I have noticed that you are also using classification loss, in addition to the VICReg and VICRegL losses. The classification loss is not mentioned in the paper and was also not used in the original VICReg.

On the first hand, it seems a bit off, given that you won't have labels available when pretraining in the wild and it destroys the purpose of "self-supervised" pre-training ?

Hi! have you ever tried to reproduce the models without the classification loss?

Adrien987k commented 1 year ago

Hi,

Thanks for pointing this out, as you mentioned, there is a detach operation preventing the gradient for propagating from the classification head to the pretrained backbone.

There is absolutely no difference between having this head and not having it. The VICRegL loss is just the total loss minus the classification loss, it will not change the learning dynamics.