Validation results show nan all the time

cvignac / DiGress

code for the paper "DiGress: Discrete Denoising diffusion for graph generation"

MIT License

314 stars 68 forks source link

Validation results show nan all the time #51

Closed FairyFali closed 1 year ago

FairyFali commented 1 year ago

I encounter a strange result during validating. the result is

Starting train epoch... Epoch X: Val NLL nan -- Val Atom type KL nan -- Val Edge type KL: nan Val loss: nan Best val loss: 100000000.0000

the NLL is always nan, why?

cvignac commented 1 year ago

Hello, this is not normal. What command are you running? Is it on a custom dataset?

FairyFali commented 1 year ago

Hello, this is not normal. What command are you running? Is it on a custom dataset?

I run the command mentioned in the readme file. It is on qm9 with discrete noise. Specifically, it is python3 main.py.

cvignac commented 1 year ago

I'm not sure where the exact issue came from (probably a different behavior of mask_distributions with recent python versions), but it's now fixed. You can use the latest commit.

FairyFali commented 12 months ago

I'm not sure where the exact issue came from (probably a different behavior of mask_distributions with recent python versions), but it's now fixed. You can use the latest commit.

I think I figure it out. because you are using .log or torch.log in functions like kl_prior, and compute_Lt but not consider the zero to log(), so it will appear nan.