PatWalters / rd_filters

A script to run structural alerts using the RDKit and ChEMBL
MIT License
125 stars 37 forks source link

Mismatching pattern due to RDKit aromaticity model #17

Open DrrDom opened 2 years ago

DrrDom commented 2 years ago

I started to play with different filters and found that many compounds were rejected by some of them and started to investigate the cases. One example is Filter82_pyridinium rule ([c,n]1[c,n][c,n][c,n][c,n]n(C)1) from Inpharmatica set. RDKit aromatizes some compounds like in example below even with AROMATICITY_SIMPLE model. This results in matching the SMARTS pattern, what I consider a false positive result. The question is whether it was expected that this pattern should remove all such compounds or this should be relevant only for compounds with charged nitrogen ([c,n]1[c,n][c,n][c,n][c,n][n+](C)1)? Or there could be another workaround? Or this is more rdkit aromaticity model issue?

from rdkit import Chem

smi = 'COC1=C2N(C)C(=O)C3=C(OC(C)(C)C=C3)C2=CC=C1'
m = Chem.MolFromSmiles(smi, sanitize=False)
Chem.SetAromaticity(m, Chem.AROMATICITY_SIMPLE)

sma = '[c,n]1[c,n][c,n][c,n][c,n][n](C)1'   # 
pat = Chem.MolFromSmarts(sma)



(3, 16, 9, 8, 6, 4, 5)
PatWalters commented 2 years ago

The patterns were taken directly from ChEMBL with a few tweaks to make them work with the RDKit. One day, when I get some time, I'll do some curation. I'd be happy to accept PRs from others who can improve the pattern.