deepmodeling / Uni-Mol

Official Repository for the Uni-Mol Series Methods
MIT License
605 stars 110 forks source link

different SMILES, same molecule, resulting in different results #139

Open zjmyfmyf opened 11 months ago

zjmyfmyf commented 11 months ago

Hello, guys. I have used one SMILES "N(CCCCN)C(=O)C(C#N)=Cc1cccn1C" and have converted it to the RDKit standard SMILES of "Cn1cccc1C=C(C#N)C(=O)NCCCCN" using API of Chem.MolFromSmiles() and Chem.MolToSmiles(). To my surprise, the trained uni-mol model (a classifier) provided two different prediction results, with 0.795 and 0.758. Though it differed a little, it matters when the result is close to 0.5; in this case, the molecule may be categorized wrongly when the SMILES is written in a different way.

I wonder if the disagreement of predictions is part of the design of this framework, it seems that the same molecules with multiple representations of SMILES will generate different conformers. Or it will be better if the SMILES are standardized before the conformer is generated as input?

guolinke commented 11 months ago

The different prediction is mainly due to the different input conformations. Even with the same smiles string, the generated 3D conformations could be different.

You can use the average prediction results of multiple different 3D conformations, to have a stable and better performance.