nan loss (noise loss) - Githubissues

CarlosNacher commented 8 months ago

Hi @HuiZhang0812 ,

I have noticed that if it is the case that all samples in a batch have anomalies (anomaly_label ==1) the loss computed within the funcion norm_guided_one_step_denoising of the DDPM.py file sets to nan:

line 359 -> loss = (normal_loss["loss"]+noisier_loss["loss"])[anomaly_label==0].mean() (because the .mean() method is calculated in a empty Tensor [])

And according to the equation (9) of your original paper

With the term (1 - y) it seems to be intended that if the sample is anomalous (y = 1), then Lnoise has to be 0. So for the code reflecting this, one posible solution is the following one:

after the line 359 (pasted above) we can include

if torch.isnan(loss):
    loss.fill_(0.0)

Thank you very much for your work,,

If there are any additional comments or adjustments needed, I would be happy to collaborate to make sure this issue is fully resolved. Also, if there are other contributors who would like to review the solution, your comments are welcome!

I appreciate your help in this matter and look forward to hearing your comments.

Best regards! N

HuiZhang0812 commented 8 months ago

Thank you for your suggestion. The default configuration for batch size is 16. When the batch size is small, it may indeed lead to an entire batch consisting solely of abnormal samples, thereby affecting the calculation of the paper's loss formula (Formula 9). Therefore, we recommend using a batch size greater than or equal to the default value (16), and in the future, we will provide more robust code to avoid the phenomenon of where entire small batch consisting solely of abnormal samples.

HuiZhang0812 commented 8 months ago

Thank you very much! In order to enable DiffusionAD to be deployed on any type of GPU, we have adopted your valuable suggestions.

CarlosNacher commented 8 months ago

Thank you very much! In order to enable DiffusionAD to be deployed on any type of GPU, we have adopted your valuable suggestions.

Many thanks to you! In case of mid-range GPUs, you can hardly put very high batch sizes, so this problem could happen. So again, thanks to you for allowing me to make that point and for taking it into account in such a constructive and polite way as a small contribution to the great work you have shared with the community.

Best regards!

HuiZhang0812 / DiffusionAD

nan loss (noise loss) #29