loss nan for distributed deep learning

HuiZhang0812 / DiffusionAD

148 stars 16 forks source link

loss nan for distributed deep learning #33

Closed cclamd closed 8 months ago

cclamd commented 8 months ago

hi , @HuiZhang0812 ,

https://github.com/HuiZhang0812/DiffusionAD/issues/26

there is no A100(80G) for me ,and i trained the model using six 3090 cards for distributed deep learning ,but find loss is alse nan ! so does it decrease for distributed deep learning ? 屏幕截图 2024-01-31 144712

best regards!

HuiZhang0812 commented 8 months ago

When the batch size is small, it may lead to an entire batch consisting solely of abnormal samples, thereby affecting the calculation of the paper's loss formula (Formula 9). We have fixed this. We have fixed this bug. Please try again.