Nan loss - Githubissues

Junshk / CinCGAN-pytorch

Unofficial Implementation of "Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks" in CVPR 2018.

142 stars 27 forks source link

Nan loss #8

Open yxiao009 opened 5 years ago

yxiao009 commented 5 years ago

Hi,

I'm getting all losses as nan, and I find out that in main.py, around line 200, dn_ = model0 returns all nan outputs. I print out the input_v and the input_v looks correct. Would you please let me know what shall I do?

Thanks!

Junshk commented 5 years ago

Did you retrain your model? In that case, download the newest version. Previous version of my code has some error. Also check whether the pre-train model exist or not.

yxiao009 commented 5 years ago

Did you retrain your model? In that case, download the newest version. Previous version of my code has some error. Also check whether the pre-train model exist or not.

Yes, I tried both with and without using the pre-train model. When using the pre-train model, it gives me nan loss at the very beginning, and when retraining the model without using the pre-train model, it gives me nan loss after a few iterations before finishing the first epoch. I also print out the outputs for model[0], the first few iterations return values, but after like 5 iters, it starts to return nan. Please help~

wl082013 commented 5 years ago

Hi did you solve the problem? I also have the same issue.

yxiao009 commented 5 years ago

@wl082013 Nope... But it seems nan occurs from model[0] and model[3] after several iterations, so I'm going to look into the details in model[0] (same structure as model[3]).

huaixu16 commented 4 years ago

@wl082013 Nope... But it seems nan occurs from model[0] and model[3] after several iterations, so I'm going to look into the details in model[0] (same structure as model[3]).

Hi!Did you solve this problem?I didn't get the nan loss, but every time when i start train i got a very large loss, e.g. 1.3e14.How can i fix this?Would you please give some advices?Thank you!

shivaang12 commented 3 years ago

@huaixu16 Did you able to solve the problem?

shivaang12 commented 3 years ago

I figured it out, change args.res_scale to 0.1 which by default it is 4.0. This will cause nan value to appear after going through res net part of EDSR.

qq1846482927 commented 3 years ago

I figured it out, change args.res_scale to 0.1 which by default it is 4.0. This will cause nan value to appear after going through res net part of EDSR. 4.0 is the super resolution scale. In the original paper "Enhanced Deep Residual Networks for Single Image Super-Resolution", it is described as follows， "we found that increasing the number of feature maps above a certain level would make the training procedure numerically unstable. A similar phenomenon was reported by Szegedy et al. [24]. We resolve this issue by adopting the residual scaling [24] with factor 0.1." I changed factor into 0.1, however the problem hasn't been solved. Could you describe more details, I'll be very grateful to you.

qq1846482927 commented 3 years ago

@shivaang12