Question on matching in the evaluation.

gcorso / DiffDock

Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking

MIT License

1.04k stars 251 forks source link

When training, the ligand's conformation is replaced by the one obtained by UFF and optimized according to the ground truth. However, in the evaluation in evaluate.py, you still use the virtual conformation of the molecules as the ground truth, by setting `num_conformers = 1' in class PDBBind. Is that reasonable?

If all the molecules are generated with RDkit, and some of them may be far from the ground truth even if they have been aligned, the model will only learn the conformations that RDKit can generate, instead of the data distribution. And in the evaluation, the RMSD is also calculated between conformations generated by RDKit and by the model. In this way, I don't think the test protocol is rigorous and precise. If it is convincing, everyone can first generate his own conformation datasets and then compare on it.

Can you give any explanation for it?

Hi @BIRD-TAO,

not sure I fully understand your question so let me know if my answer does not address it. I believe you are asking why we perform the conformer matching procedure also in the evaluate procedure. The answer is that we do it because we are interested in knowing the optimal performance for those degrees of freedom, but this matching indeed DOES NOT IMPACT the final configuration that the model produces. Indeed, the only degrees of freedom that are changed in the matching procedure are the torsion angles and the relative position, but these same degrees of freedom are completely randomised when starting the sampling process (in the randomize_position procedure).

I hope this clarifies your question!

gcorso / DiffDock

Question on matching in the evaluation. #123