cvignac / DiGress

code for the paper "DiGress: Discrete Denoising diffusion for graph generation"
MIT License
314 stars 68 forks source link

Validation step returns nan loss because of division by zero #98

Open rudolfwilliam opened 1 week ago

rudolfwilliam commented 1 week ago

When training the DiscreteDenoisingDiffusion model on the guidance branch, the validation step (but not the training step) returns nan as loss. Specifically, it seems to have to do with line 289 in DiGress/src/diffusion/diffusion_utils.py, where the denominator is zero for some entries, which in turn leads to division by zero in line 291. Interestingly, there is some code commented out where zero entries are replace by one, which would solve the issue. Why is this code commented out? How else can this problem be solved?

rudolfwilliam commented 1 week ago

Similarly, the kl_prior() function in DiGress/src/diffusion_model_discrete.py returns nan because probX in line 291 and probE in line 292 evaluate to zero for some entries, leading to nan kl divergence.