chairc / Integrated-Design-Diffusion-Model

IDDM (Industrial, landscape, animate, spectrogram...), support DDPM, DDIM, PLMS, webui and multi-GPU distributed training. Pytorch实现,生成模型,扩散模型,分布式训练
Apache License 2.0
152 stars 22 forks source link

Why MES loss is nan in training? #60

Open bestl1fe opened 7 months ago

bestl1fe commented 7 months ago

When I trained 120 size images to epoch 30, nan appeared and all images turned black. I didn't save the model for each training, so I had to retrain my model.

I think we should add a way to prevent nan.

image

chairc commented 7 months ago

Good work. Could you submit a pr for this issue?

chairc commented 7 months ago

The possible reasons for this problem may include an excessively high learning rate, unstable operations in the loss function, issues with the activation function, etc. Of course, future versions will be updated to include a check for NaN values.

chairc commented 1 month ago

We submit a function for NaN detection #89