Closed chrishendra93 closed 3 years ago
Hi @chrishendra93, thanks for your interest.
Yes, exactly that's totally normal, not a problem. The loss is still being minimized, regardless of whatever its value is.
If you really don't like looking at a negative loss, you can add a constant to the me-max regularizer. It won't change the gradient at all. For example, maximizing entropy of the average prediction is equivalent to minimizing the KL divergence to the uniform distribution. Therefore, you can add math.log(len(avg_probs))
to the me-max regularizer here (i.e., turning it into the KL divergence), and it will have no effect on the actual training, giving you the same results, but ensuring that the loss is always positive.
Hi, I really like the idea of this paper, not just that it is simple, it leverages small amount of labelled data instead of completely unsupervised. I have been trying to incorporate PAWS idea to my own project with small amount of images but with segmentation labels.
So far I have been having some problems with the loss going to negative because the entropy of the average sharpened probability tends to reach maximum easily. Am I right in saying that there's no theoretical guarantee for the mean sharpened probability entropy loss to always be smaller than the cross entropy loss and did you encounter this problem during training PAWS?