epam / Indigo

Universal cheminformatics toolkit, utilities and database search tools
http://lifescience.opensource.epam.com
Apache License 2.0
315 stars 104 forks source link

Canonical SMILES different depending on input atom order #100

Closed baoilleach closed 7 years ago

baoilleach commented 7 years ago

Hi there, there is rather an unusual canonicalisation failure for the molecules in the example below. Tested with both 1.2.3 and 1.3.0b16.

The example code converts the same Kekule form of a molecule to canonical SMILES, where the atoms are in a different order. The results are not the same, differing in the use of an aromatic bond symbol (which is unneccessary in the context - actually, it's unneccessary in any context). The code is below, the results are:

C1c2cc3cc4ccc5cc6Cc7cc8ccc1c1c9c%10c%11c(c6c7c%10c81)c5c4c%11c3c29
C1c2cc3cc4ccc5cc6Cc7cc8ccc1c1c9c%10c%11c(c6:c7c%10c81)c5c4c%11c3c29
from __future__ import print_function
import sys
# sys.path.append(r"C:\Tools\Indigo\indigo-python-1.2.3.r0-win")
sys.path.append(r"C:\Tools\Indigo\indigo-python-1.3.0beta.r16-win")
from indigo import Indigo, IndigoException
indigo = Indigo()

kekules = [
  "C12=C3C4=C(C=C5CC6=CC7=CC=C8C=C9C%10=C%11C8=C7C7=C6C5=C4C(=C7%11)C3=C%10C(=C9)C1)C=C2",
  "C12=C3C4=C(CC5=CC6=CC7=CC=C8C9=C7C7=C6C5=C4C4=C7C9=C5C(=C8)CC(=C1)C5=C43)C=C2"
  ]

for kekule in kekules:
    mol = indigo.loadMolecule(kekule)
    mol.aromatize()
    can = mol.canonicalSmiles()
    print(can)
IuriiPuzanov commented 7 years ago

Hello Noel,

Thank you very much for finding. It looks like the reason of such behavior is in different cycle basis computed for these two variants. I would like to recommend you to use preliminary canonicalization of the structure before aromatizing in such complex cases. By this way the results will be the same in both cases.

Best Regards! Yuriy