Open zjmyfmyf opened 11 months ago
The different prediction is mainly due to the different input conformations. Even with the same smiles string, the generated 3D conformations could be different.
You can use the average prediction results of multiple different 3D conformations, to have a stable and better performance.
Hello, guys. I have used one SMILES "N(CCCCN)C(=O)C(C#N)=Cc1cccn1C" and have converted it to the RDKit standard SMILES of "Cn1cccc1C=C(C#N)C(=O)NCCCCN" using API of Chem.MolFromSmiles() and Chem.MolToSmiles(). To my surprise, the trained uni-mol model (a classifier) provided two different prediction results, with 0.795 and 0.758. Though it differed a little, it matters when the result is close to 0.5; in this case, the molecule may be categorized wrongly when the SMILES is written in a different way.
I wonder if the disagreement of predictions is part of the design of this framework, it seems that the same molecules with multiple representations of SMILES will generate different conformers. Or it will be better if the SMILES are standardized before the conformer is generated as input?