ebi-chebi / ChEBI

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.
https://www.ebi.ac.uk/chebi
Creative Commons Attribution 4.0 International
44 stars 10 forks source link

Error in SMILES for CHEBI:26355 #4329

Open hrp1000 opened 1 year ago

hrp1000 commented 1 year ago

Hi I have a problem with this entry. If I try to generate a fingerprint for calculating the Tanimoto coefficient with CHEBI:60344 with this code, it works fine for CHEBI:17267, but fails for CHEBI:26355 -

from rdkit import Chem, DataStructs from bioservices import ChEBI heme = ChEBI() heme_chebi_id = "CHEBI:60344" heme_smiles = heme.getCompleteEntity(heme_chebi_id).smiles target = Chem.MolFromSmiles(heme_smiles) fp2 = Chem.RDKFingerprint(target) for chebi_id in ["CHEBI:17627", "CHEBI:26355"]: ch = ChEBI() smiley = ch.getCompleteEntity(chebi_id).smiles print("reference:", heme_chebi_id) print("target: ", chebi_id) print("reference:", heme_smiles) print("target: ", smiley) ref = Chem.MolFromSmiles(smiley) fp1 = Chem.RDKFingerprint(ref) Tan = DataStructs.TanimotoSimilarity(fp1, fp2) print(Tan) print("-" * 64) exit()

amalik01 commented 1 year ago

Historically, coordination bonds have been depicted in a variety of ways but IUPAC recommends that these bonds should be depicted as regular 'plain' bonds as shown in CHEBI:26355 (see other examples: https://iupac.qmul.ac.uk/tetrapyrrole/TP8.html)

Some existing software including RDKit are unable interpret properly coordination bonds with single bonds and without charges since the structure does not satisfy their strict valence criteria so therefore is unable to generate a fingerprint. You can either contact RDKit about this issue or add charges to the structure to satisfy RDKit's criteria.

hrp1000 commented 1 year ago

Since I'm not going to edit the structure (since my hope is that it is correct in the major publicly accessible databases), I think that I'll have to contact rdkit...