Gettin NaN after few iterations

Po-Hsun-Su / pytorch-ssim

pytorch structural similarity (SSIM) loss

Other

1.87k stars 364 forks source link

Gettin NaN after few iterations #8

Closed bernardohenz closed 6 years ago

bernardohenz commented 6 years ago

Hello,

I am using your SSIM implementation as a part of my total objective for performing denoising. Unfortunately, after a few iterations, I started getting NaN in the objective (this does not happen if I remove the SSIM from the loss).

I am wondering if there is any occasion where your implementation may divide by zero, or something that may cause NaN. By just looking at the code, I don't know why this could happen (since C1 and C2 are used to avoid zero-divisions).

Any thoughs?

Po-Hsun-Su commented 6 years ago

Can you post a snippet that can reproduce the problem? The NaN objective could be caused by exploding gradient which may or may not come from ssim. Discussion to debug NaN: https://discuss.pytorch.org/t/solved-debugging-nans-in-gradients/10532 Try to run latest pytorch. There could be bug in pytorch like this one https://github.com/pytorch/pytorch/issues/2421.

bernardohenz commented 6 years ago

Hello, I've solved it. The problem was that I was using several InstanceNormalization layers, and when the network got a patch of constant color (whose variance in pixels is close to 0), when normalizing (dividing by the std of the colors), it returned very big numbers (eventually exploding).

So, the problem was really due to some bad data in the training set. The hard part of debuging this is that this was not like a zero-division case, but rather operations returning exploding values.

Po-Hsun-Su commented 6 years ago

Cool! I close this issue since you solved it.