Why set the normalize value and bias?

Yes, of course. We ended up not using the bias, but the normalize values are used for h_onehot (the atom type) and h_int (the charge). They correspond to the paragraph titled "Scaling Features" in the paper. Scaling the features changes the diffusion process, and therefore the generative process that needs to be learned.

Essentially, we found that multiplying h_onehot by 0.25 gave a significant performance benefit in the stability metrics. In code we achieve this my giving the appropriate "normalize_values". In experiments we found that the model trained on this 0.25 setting scores much higher on the stability metrics.

We think that it is easier for the model to first get a rough estimate on x (positions, and then decide h (type & charge). By multiplying h by 0.25, the denoising process focuses more on x in the early stages of the generative process, and only later on h.

ehoogeboom / e3_diffusion_for_molecules

Why set the normalize value and bias? #1