Question about reproducing results

KyGao commented 2 months ago

Hi Yuning,

I am very interesting in your great work, and I'm attempting to reproduce your results using the Jupyter notebook. However, I've encountered some challenges with the 2D distribution results while using the provided sampled SMILES data.

Specifically, I directly evaluated AE_geom_uncond_weights_and_data/job17_latent_ddpm_qm9_spatial_graphs/sample_smiles.pt and sample_conformer.pt. For the test SMILES, I loaded e3_diffusion_for_molecules/data/smiles_qm9.txt. All the files were sourced from your comprehensive Zenodo repository.

I've attached a figure showing the results I obtained. Could you kindly advise if I'm using the correct data or provide any guidance on the appropriate files to use for reproducing the 2D distribution results?

yryMax commented 2 months ago

Hello, I am not from the team of this paper, but I have tried the same thing and I may be able to answer (part of) your question. stats_tar is a scaled distribution, the goal is to let stats_tar and stats_gen have the same number of samples, In their case they use 30000 as the scaling factor because after filtering their generation set have 30000 samples(you can reason this in the evaluation notebook output). So in your case you need to change this factor to 3425 (because only 3425 mols remain after the distribution preserved sampling)

This may give you a better result(smaller distance)

However, the best result I can get (pasted below) is still worse than the results in the paper(equivalent or worse than EDM performance).

I'm not sure why, but I guess it's because my samples are all from job17's checkpoint, and in their evaluation they use samples from job6's checkpoint. So If you successfully reproduce the results in the paper, please tell me your setup, thanks!

yyou1996 commented 2 months ago

Hi @KyGao and @yryMax,

Thank you both for raising this issue. After my check, I identify the issue should result from the insufficient number of samples, leading to inaccurate estimation of the metrics. When I try to reduce my original sample smiles (used in paper eval) from x10k to x1k, I get the similar results as you, which becomes normal when increasing the sample number (you can have a try using it I upload here -- sample_smiles.pt).

KyGao commented 2 months ago

Thank you, @yryMax, for the explanation, and @yyou1996, for providing the additional SMILES. With the provided SMILES and additional filtration, I was able to almost reproduce the paper's results. It seems that the baseline EDM does not apply such constraints, and I'm unsure if this difference is significant. Anyway, thanks for addressing my question, I will be closing this issue.

yyou1996 commented 2 months ago

Hi,

The constraint is added here for a fair comparison to better utilize the information of the number of atoms. In the EDM case, it generates the graphs given a prefixed graph size, and the size distribution comes from the training dataset.

In our case, the graph size is not part of the model input, that the size distribution can not be explicitly used here. We apply this knowledge by filtering, and empirically the size distribution will impact the result a lot.

Shen-Lab / LDM-3DG

Question about reproducing results #4