PatWalters / rd_filters

A script to run structural alerts using the RDKit and ChEMBL
MIT License
125 stars 37 forks source link

CF2 SMARTS #3

Closed adalke closed 5 years ago

adalke commented 5 years ago

In Notes.txt you write that you replaced $(CF2) with [CF2] to make it something that RDKit will parse.

The "[CF2]" means "atom with isotope number 2 and is carbon and fluorine", which is impossible to match.

I think the SMARTS of $(CF2) is trying to be $(C(F)F). The "CF2" is supposed to be a SMARTS expression, but in this case it has the unclosed ring closure "2". SMARTSViewer also complains about the $(CF2) saying "SMARTS Warning: Unpaired ring bonds!"

This suggests that the ChEMBL tool ignores the illegal SMARTS. I wonder if it matches CF (when there aren't 2 fluorines), or if it ignores that term (ie, not match CF at all), or ignores any matches.

I see nothing wrong with your other fixes.

PatWalters commented 5 years ago

Thanks, Andrew, I'll change that.

On Fri, Oct 5, 2018 at 7:02 AM Andrew Dalke notifications@github.com wrote:

In Notes.txt you write that you replaced $(CF2) with [CF2] to make it something that RDKit will parse.

The "[CF2]" means "atom with isotope number 2 and is carbon and fluorine", which is impossible to match.

I think the SMARTS of $(CF2) is trying to be $(C(F)F). The "CF2" is supposed to be a SMARTS expression, but in this case it has the unclosed ring closure "2". SMARTSViewer also complains about the $(CF2) saying "SMARTS Warning: Unpaired ring bonds!"

This suggests that the ChEMBL tool ignores the illegal SMARTS. I wonder if it matches CF (when there aren't 2 fluorines), or if it ignores that term (ie, not match CF at all), or ignores any matches.

I see nothing wrong with your other fixes.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PatWalters/rd_filters/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/ACoqzTnS3ewkwni2WCGsoe1Y8_0DYI2Aks5uhzxWgaJpZM4XKAvM .

pieterhbos commented 5 years ago

To add to this, the alert_collection.csv contains 10 additional cases with "CF2". One in rule 196 and nine in rule 245. Replacing [CF2] with C(F)F fixes the issue

PatWalters commented 5 years ago

Sorry, thought I had fixed that one. The rules have been updated.