AstraZeneca / jazzy

Fast calculation of hydrogen-bond strengths and free energy of hydration of small molecules.
https://jazzy.readthedocs.io/en/latest/
Other
76 stars 6 forks source link

RDKit embedding fails for valid SMILES #194

Closed ghiander closed 1 year ago

ghiander commented 1 year ago

Using jazzy==0.0.7


>>> smiles="C1N[C@@H]2CO[C@@H]1C2"
>>> molecular_vector_from_smiles(smiles)

Jazzy ERROR: [16:25:17] The RDKit embedding has failed for the molecule: C1N[C@@H]2CO[C@@H]1C2
---------------------------------------------------------------------------
JazzyError                                Traceback (most recent call last)
<ipython-input-1-f7ab322d6341> in <module>
      1 from jazzy.api import molecular_vector_from_smiles
      2 smiles="C1N[C@@H]2CO[C@@H]1C2"
----> 3 molecular_vector_from_smiles(smiles)

~/miniconda3/lib/python3.8/site-packages/jazzy-local_beta-py3.8.egg/jazzy/api.py in molecular_vector_from_smiles(smiles, minimisation_method, only_strengths)
     35     """
     36     # Calculate basic descriptors
---> 37     rdkit_mol, kallisto_mol = __smiles_to_molecule_objects(smiles, minimisation_method)
     38     atoms_and_nbrs = get_covalent_atom_idxs(rdkit_mol)
     39     kallisto_charges = get_charges_from_kallisto_molecule(kallisto_mol, charge=0)

~/miniconda3/lib/python3.8/site-packages/jazzy-local_beta-py3.8.egg/jazzy/api.py in __smiles_to_molecule_objects(smiles, minimisation_method)
     22     rdkit_mol = rdkit_molecule_from_smiles(smiles, minimisation_method=minimisation_method)
     23     if rdkit_mol is None:
---> 24         raise JazzyError("The SMILES '{}' appears to be invalid.".format(smiles))
     25     kallisto_mol = kallisto_molecule_from_rdkit_molecule(rdkit_mol)
     26     return rdkit_mol, kallisto_mol

JazzyError: The SMILES 'C1N[C@@H]2CO[C@@H]1C2' appears to be invalid.```

(Note that, the structure "C1N[C@@H]2CO[C@H]1C2" works fine)
ghiander commented 1 year ago

After inspecting the issue in detail, it turns out that the molecule is indeed invalid from a sterechemical point of view. RDKit is failing to embed the moleucle because the configuration of the atoms leads to an impossible geometry. A solution would be to regenerate the stereochemistry of the compound but that should not be done by Jazzy.