why the training loss is nan?

senlin-ali commented 2 years ago

hi , i have a question about training. when i train my own data, the loss begins 1.0 to 0.8... and then the loss is nan, how can i solve this question?

WEIZHIHONG720 commented 2 years ago

I also have the same problem

Chzzi commented 2 years ago

hi , i have a question about training. when i train my own data, the loss begins 1.0 to 0.8... and then the loss is nan, how can i solve this question?

I also have the same problem. My own dataset only contains lowlight images. If it is necessary to contain both lowlight images and normal images in the train dataset?

cwzzzzz commented 1 year ago

I encountered the same problem, and each time it occurred in a specific iteration of the first epoch. At this training iteration, all four components of Loss become nan. I tried to place the image of that specific iteration in the first batch of training, but there was no loss=nan condition in the first batch. Therefore, I guess this phenomenon has nothing to do with my data.

For better judgment, the training hyperparameters I use are all default values. Another attempt was made to load the pretraining model, but loss=nan still occurred at a specific iteration.

Do you have any good suggestions? If you need other training details, you can reply to me.

We look forward to your reply! thank you! @Li-Chongyi

Li-Chongyi / Zero-DCE

why the training loss is nan? #35