Open Zhudongsheng75 opened 2 years ago
Please check the update losses.py in Pull requests.
I am also facing the same issue.
The suggested fix also doesn't help:
https://colab.research.google.com/drive/14IJ_xrfOexa7X_uM7dURT-itoVJoLjo2?usp=sharing
I also have met this problem. In my opinion, the line of calculating log_prob is not robust enough. There is very little probability that it will calculate log(0), which actually produces nan in the loss. https://github.com/HobbitLong/SupContrast/blob/a8a275b3a8b9b9bdc9c527f199d5b9be58148543/losses.py#L89 It is better to rewritten as: log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True) + 1e-6)
I also have met this problem. I use SupCon loss and ResNet12 as encoder. I set temperature=0.07 and print out logits at "/SupContrast/blob/master/losses.py" line 74 When features were not nomalized, logits elements are close to -9000,and at "/SupContrast/blob/master/losses.py" line 88, exp_logits elements are zeros. When features were nomalized, logits elements are close to -10,and at "/SupContrast/blob/master/losses.py" line 88, exp_logits elements are close to e-07. Maybe temperature>1 and normalize(features) can help.
I have also faced this issue. Apart from the solutions mentioned above I would suggest the following changes:
In line 92: mean_log_prob_pos = (mask * log_prob).sum(1) / mask.sum(1)
It is possible for all the elements in one row of mask to be zero which causes a "division by zero" resulting in nan loss values. I personally did the following modification which worked for me: mean_log_prob_pos = (mask*log_prob).sum(1)/(mask.sum(1)+1e-6)
Also adding that the input features need to be normalized as pointed out earlier in the comments.
Hi! @Jf-Chen, just as a clarification, when you say normalize the feature with normalize(features) you mean along the feature dimension right (so here 128), not batch normalize?
Hi! @Jf-Chen, just as a clarification, when you say normalize the feature with normalize(features) you mean along the feature dimension right (so here 128), not batch normalize?
I know little about methods of normalization, but my purpose of normalization is clear, that is "limiting the value of feature and avoid large negative value, for example, -9000".
For better understanding, take a feature [128,640] as example, it is [batch_size,dim], the output of projector in SupCon.
logits_unnorm = self.proj(features_all) # logits_unnorm is [128,640]
logits = F.normalize(logits_unnorm, dim=1)
In the document of pytorch, it says "dim" means which should be reduced and the index start from zero, so reduce 640 should use "dim=1".
I am not sure my understanding is right or not, but it works well, no bug, satisfied accuracy.
Is it called batch normalize? I am not sure about it, there are too many formulas on wiki. :eyes:
I want to use this loss func, but it returns 'nan' for me when I run the test code. Who can tell me why? Thanks!
add F.normalize(features, dim=1). And it works.
I want to use this loss func, but it returns 'nan' for me when I run the test code. Who can tell me why? Thanks!