NVlabs / MUNIT

Multimodal Unsupervised Image-to-Image Translation
Other
2.64k stars 483 forks source link

Loss became 'Nan' when I train the MUINT using our night2day datasets #35

Closed 805300384 closed 5 years ago

805300384 commented 6 years ago

Loss, parameters, and weight of networks became Nan after some time of training MUNIT. I don't know how it happened.

I have tried to adjust some hyperparameters, but it seemed of no use. I wanna know how I can make adjust to solve this problem.

I'm not really deep into this topic, so really appreciate your answer.

Cuky88 commented 6 years ago

Without further details no one can tell you what's wrong. MUNIT works fine with the given settings and parameters.

Since you changed parameters, it would be of help if you post your settings and any other changes. Besides that, which datasets are you using?

When I encountered such problems, in the most cases it's either due to too high learning rates or the erroneous data (images) itself, of course if you did not change anything on the code.

oleksandrlazariev commented 5 years ago

@805300384 the problem is in dataloader. check what dataloader samples. sometimes it can return samples filled with 1. which leads to NaNs

circlehy commented 5 years ago

same problem, using my own dataset, which has been used for many other algorithm, training loss(trainer.loss_gen_total and trainer.loss_dis_total) turn NaN quickly. Then I changed the lr within 0.0001 to 0.0000001 , still can not fix the problem. I am not sure what maybe wrong with it.

805300384 commented 5 years ago

@805300384 the problem is in dataloader. check what dataloader samples. sometimes it can return samples filled with 1. which leads to NaNs

Thank you very much.

805300384 commented 5 years ago

same problem, using my own dataset, which has been used for many other algorithm, training loss(trainer.loss_gen_total and trainer.loss_dis_total) turn NaN quickly. Then I changed the lr within 0.0001 to 0.0000001 , still can not fix the problem. I am not sure what maybe wrong with it.

I changed my dataset and the problem was solved

syz825211943 commented 5 years ago

So how to solve this problem? Could you give me some suggestions? Thanks!