About the generation process

tszslovewanpu commented 6 months ago

Hello, and great job! 1、When generating the 10K molecules in Table 1、Table2, or Table3, etc., should we input some molecules, are they from the ZINK250K or MOSES? OR SGDS generates molecules in a manner of sampling from the latent vector $z$, after trained on the ZINK250K and MOSES datasets. i.e., the inference process is from $z$ to $x$ in Figure 1.

2、I have this question because that other method such as LIMO in Table 1, generate molecules by giving an input from ZINK250K, and gives a better molecule with higher QED property. So their generation process is actually an optimization process, does SGDS the same as them?

Thank you very much!

deqiankong commented 6 months ago

Thanks for your questions.

In generation, we use $p(z|y)$, which only takes the value $y$ as input. In the experiment, we initialize $y$ as the values in the 10k test set. However, it should be ok to start from other value $y$, such as the 10k high values in the training set.
In the optimization experiments, the generated molecules are only conditioned on the property values, i.e. $p(x|y)$. However, in the structure-constrained experiment, we may need to improve the property based on a given molecule backbone, $p(x|y, \tilde{x})$, which is not studied in SGDS.

tszslovewanpu commented 6 months ago

Got it, thank you!

deqiankong / SGDS

About the generation process #2