Closed ds2268 closed 1 year ago
I have prepared a pull request that should fix the issue (https://github.com/facebookresearch/VICRegL/pull/10). I think that this should not be added to the global loss.
I see that you have used detach. The learned representation was thus not affected by the gradients coming from the classification loss. Still, this is ugly and the actual loss value does thus not represent the actual VICRegL loss.
@Adrien987k thank you for the work and the implementation.
I have noticed that you are also using classification loss, in addition to the VICReg and VICRegL losses. The classification loss is not mentioned in the paper and was also not used in the original VICReg.
On the first hand, it seems a bit off, given that you won't have labels available when pretraining in the wild and it destroys the purpose of "self-supervised" pre-training ?
Hi! have you ever tried to reproduce the models without the classification loss?
Hi,
Thanks for pointing this out, as you mentioned, there is a detach operation preventing the gradient for propagating from the classification head to the pretrained backbone.
There is absolutely no difference between having this head and not having it. The VICRegL loss is just the total loss minus the classification loss, it will not change the learning dynamics.
@Adrien987k thank you for the work and the implementation.
I have noticed that you are also using classification loss, in addition to the VICReg and VICRegL losses. The classification loss is not mentioned in the paper and was also not used in the original VICReg.
On the first hand, it seems a bit off, given that you won't have labels available when pretraining in the wild and it destroys the purpose of "self-supervised" pre-training ?