Why the func returns "nan"?

HobbitLong / SupContrast

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

BSD 2-Clause "Simplified" License

3.12k stars 537 forks source link

Why the func returns "nan"? #104

Open Zhudongsheng75 opened 2 years ago

Zhudongsheng75 commented 2 years ago

I want to use this loss func, but it returns 'nan' for me when I run the test code. Who can tell me why? Thanks!

evechny131 commented 2 years ago

Please check the update losses.py in Pull requests.

ksivajana commented 2 years ago

I am also facing the same issue.

The suggested fix also doesn't help:

https://colab.research.google.com/drive/14IJ_xrfOexa7X_uM7dURT-itoVJoLjo2?usp=sharing

yuanlonghui commented 2 years ago

I also have met this problem. In my opinion, the line of calculating log_prob is not robust enough. There is very little probability that it will calculate log(0), which actually produces nan in the loss. https://github.com/HobbitLong/SupContrast/blob/a8a275b3a8b9b9bdc9c527f199d5b9be58148543/losses.py#L89 It is better to rewritten as: log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True) + 1e-6)

Jf-Chen commented 2 years ago

I also have met this problem. I use SupCon loss and ResNet12 as encoder. I set temperature=0.07 and print out logits at "/SupContrast/blob/master/losses.py" line 74 When features were not nomalized, logits elements are close to -9000，and at "/SupContrast/blob/master/losses.py" line 88, exp_logits elements are zeros. When features were nomalized, logits elements are close to -10，and at "/SupContrast/blob/master/losses.py" line 88, exp_logits elements are close to e-07. Maybe temperature>1 and normalize(features) can help.

Shubhammawa commented 2 years ago

I have also faced this issue. Apart from the solutions mentioned above I would suggest the following changes:

In line 92: mean_log_prob_pos = (mask * log_prob).sum(1) / mask.sum(1)

It is possible for all the elements in one row of mask to be zero which causes a "division by zero" resulting in nan loss values. I personally did the following modification which worked for me: mean_log_prob_pos = (mask*log_prob).sum(1)/(mask.sum(1)+1e-6)

Also adding that the input features need to be normalized as pointed out earlier in the comments.

piconti commented 2 years ago

Hi! @Jf-Chen, just as a clarification, when you say normalize the feature with normalize(features) you mean along the feature dimension right (so here 128), not batch normalize?

Jf-Chen commented 2 years ago

Hi! @Jf-Chen, just as a clarification, when you say normalize the feature with normalize(features) you mean along the feature dimension right (so here 128), not batch normalize?

I know little about methods of normalization, but my purpose of normalization is clear, that is "limiting the value of feature and avoid large negative value, for example, -9000".

For better understanding, take a feature [128,640] as example, it is [batch_size,dim], the output of projector in SupCon. logits_unnorm = self.proj(features_all) # logits_unnorm is [128,640] logits = F.normalize(logits_unnorm, dim=1) In the document of pytorch, it says "dim" means which should be reduced and the index start from zero, so reduce 640 should use "dim=1".

I am not sure my understanding is right or not, but it works well, no bug, satisfied accuracy.

Is it called batch normalize? I am not sure about it, there are too many formulas on wiki. :eyes:

HYC01 commented 10 months ago

I want to use this loss func, but it returns 'nan' for me when I run the test code. Who can tell me why? Thanks!

add F.normalize(features, dim=1). And it works.