chemosim-lab / ProLIF

Interaction Fingerprints for protein-ligand complexes and more
https://prolif.readthedocs.io
Apache License 2.0
337 stars 66 forks source link

Default halogen bond SMARTS ignores carbonyles #189

Closed asiomchen closed 3 months ago

asiomchen commented 4 months ago

Recently, I been trying to detect halogen bonds with ProLIF and to my surprise tyrosine is not recognized as acceptor, despite having carbonyl near ligand (Schrodinger's Maestro successfully detects it), it turns out that default SMARTS only matches single bond between atoms in the acceptor, changing original SMARTS from [#7,#8,P,S,Se,Te,a;!+{1-}][*] to [#7,#8,P,S,Se,Te,a;!+{1-}]!#[*]

from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from copy import copy
from IPython.display import display
from itertools import chain

def plot_2D_highlight(mol, matches):
    ligand_2d = copy(mol)
    AllChem.Compute2DCoords(ligand_2d)
    matches = list(chain(*matches))
    if len(matches) == 0:
        display(Draw.MolToImage(ligand_2d, size=(400, 200)))
    else:
        display(Draw.MolToImage(ligand_2d, highlightAtoms=matches, size=(400, 200)))

original_pattern = Chem.MolFromSmarts("[#7,#8,P,S,Se,Te,a;!+{1-}][*]")

# new pattern matches not only a single bond in acceptor but any bond other than a triple bond
# probably we would not see nitrile as acceptor, but this pattern now matches carbonyl, which is acceptor in
# many complexes available in the PDB (according to the Auffinger et al. PNAS 2004 paper)

new_pattern = Chem.MolFromSmarts("[#7,#8,P,S,Se,Te,a;!+{1-}]!#[*]")
histidine = Chem.MolFromSmiles("C1=C(NC=N1)CC(C(=O)O)N")
# nitryl is added to the tyrozine just to show that the new pattern does not match it, as well as the original pattern
tyrozine = Chem.MolFromSmiles("C1=C(C#N)C(=CC=C1C[C@@H](C(=O)O)N)O")

original_pattern_matches_his = list(chain(*histidine.GetSubstructMatches(original_pattern)))
new_pattern_matches_his = list(chain(*histidine.GetSubstructMatches(new_pattern)))
original_pattern_matches_tyr = list(chain(*tyrozine.GetSubstructMatches(original_pattern)))
new_pattern_matches_tyr = list(chain(*tyrozine.GetSubstructMatches(new_pattern)))
Draw.MolsToGridImage([histidine, histidine, tyrozine, tyrozine], molsPerRow=2, subImgSize=(400, 200),
                        legends=["Original pattern", "New pattern", "Original pattern", "New pattern"],
                        highlightAtomLists=[original_pattern_matches_his, 
                                            new_pattern_matches_his, original_pattern_matches_tyr, 
                                            new_pattern_matches_tyr])

obraz Maybe this SMARTS should become the default?