facebookresearch / suncet

Code to reproduce the results in the FAIR research papers "Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples" https://arxiv.org/abs/2104.13963 and "Supervision Accelerates Pre-training in Contrastive Semi-Supervised Learning of Visual Representations" https://arxiv.org/abs/2006.10803
MIT License
488 stars 67 forks source link

Negative loss #21

Closed chrishendra93 closed 2 years ago

chrishendra93 commented 2 years ago

Hi, I really like the idea of this paper, not just that it is simple, it leverages small amount of labelled data instead of completely unsupervised. I have been trying to incorporate PAWS idea to my own project with small amount of images but with segmentation labels.

So far I have been having some problems with the loss going to negative because the entropy of the average sharpened probability tends to reach maximum easily. Am I right in saying that there's no theoretical guarantee for the mean sharpened probability entropy loss to always be smaller than the cross entropy loss and did you encounter this problem during training PAWS?

MidoAssran commented 2 years ago

Hi @chrishendra93, thanks for your interest.

Yes, exactly that's totally normal, not a problem. The loss is still being minimized, regardless of whatever its value is.

If you really don't like looking at a negative loss, you can add a constant to the me-max regularizer. It won't change the gradient at all. For example, maximizing entropy of the average prediction is equivalent to minimizing the KL divergence to the uniform distribution. Therefore, you can add math.log(len(avg_probs)) to the me-max regularizer here (i.e., turning it into the KL divergence), and it will have no effect on the actual training, giving you the same results, but ensuring that the loss is always positive.