When training the DiscreteDenoisingDiffusion model on the guidance branch, the validation step (but not the training step) returns nan as loss. Specifically, it seems to have to do with line 289 in DiGress/src/diffusion/diffusion_utils.py, where the denominator is zero for some entries, which in turn leads to division by zero in line 291. Interestingly, there is some code commented out where zero entries are replace by one, which would solve the issue. Why is this code commented out? How else can this problem be solved?
Similarly, the kl_prior() function in DiGress/src/diffusion_model_discrete.py returns nan because probX in line 291 and probE in line 292 evaluate to zero for some entries, leading to nan kl divergence.
When training the DiscreteDenoisingDiffusion model on the guidance branch, the validation step (but not the training step) returns nan as loss. Specifically, it seems to have to do with line 289 in
DiGress/src/diffusion/diffusion_utils.py
, where the denominator is zero for some entries, which in turn leads to division by zero in line 291. Interestingly, there is some code commented out where zero entries are replace by one, which would solve the issue. Why is this code commented out? How else can this problem be solved?