fix a bug, which has the little probability of producing nan in the loss

HobbitLong / SupContrast

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

BSD 2-Clause "Simplified" License

3.12k stars 537 forks source link

fix a bug, which has the little probability of producing nan in the loss #111

Open yuanlonghui opened 2 years ago

yuanlonghui commented 2 years ago

This is the simplest way to prevent calculating log(0). And this is nesseceray when the embedding dim is large. When feature representations have very high dimensions, the maximum inner product is very likely to be the inner product with itself. After subtracting the maximum value, this will result in a lot of negative values in non-diagonal positions. This means that after exp(), it's very likely to be zero anywhere but the diagonal. In this case, since the diagonal position is not considered inside the log(), there is a probability that log(0) will be computed, resulting in nan.

Dara-to-win commented 1 year ago

This is the simplest way to prevent calculating log(0). And this is nesseceray when the embedding dim is large. When feature representations have very high dimensions, the maximum inner product is very likely to be the inner product with itself. After subtracting the maximum value, this will result in a lot of negative values in non-diagonal positions. This means that after exp(), it's very likely to be zero anywhere but the diagonal. In this case, since the diagonal position is not considered inside the log(), there is a probability that log(0) will be computed, resulting in nan.

Thank you very much for solving the problem that loss is NaN. Will your loss become higher and higher when you train? I look forward to your reply!

yaoerqin commented 8 months ago

This is the simplest way to prevent calculating log(0). And this is nesseceray when the embedding dim is large. When feature representations have very high dimensions, the maximum inner product is very likely to be the inner product with itself. After subtracting the maximum value, this will result in a lot of negative values in non-diagonal positions. This means that after exp(), it's very likely to be zero anywhere but the diagonal. In this case, since the diagonal position is not considered inside the log(), there is a probability that log(0) will be computed, resulting in nan.

Thank you very much for solving the problem that loss is NaN. Will your loss become higher and higher when you train? I look forward to your reply!

Did you solve this?