CDK-R / cdkr

Integrating R and the CDK
https://cdk-r.github.io/cdkr/
42 stars 27 forks source link

rcdk::matches() function bugs #136

Open YANGJJ93MS opened 1 year ago

YANGJJ93MS commented 1 year ago

There is an issue for substructure match function. I got true value even if the substructure is not in the query molecule.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


There is an issue for substructure match function. I got true value even if the substructure is not in the query molecule.

mol = parse.smiles('CN(C)c1cccc2c(S(=O)(=O)Oc3ccc4c5c3OC3C(=O)CC(O)C6(O)C(C4)N(CC4CC4)CCC536)cccc12')
query =  'CN(C)CCc1ccccc1'
rcdk::matches(query,mol)

Screenshots

"rcdk::matches(query2,mol1) CN(C)c1cccc2c(S(=O)(=O)Oc3ccc4c5c3OC3C(=O)CC(O)C6(O)C(C4)N(CC4CC4)CCC536)cccc12.match TRUE"

System (please complete the following information):

zachcp commented 1 year ago

Hi, thanks for your report. So I am less familiar with the SMARTS but here's what I've found:

rajarshi commented 1 year ago

If you try it out at https://www.simolecule.com/cdkdepict/depict.html, the supplied SMARTS pattern does match the the molecule.

However, I agree that we should fix the function to use SMARTSPattern

On Sun, Feb 12, 2023 at 12:19 AM zachcp @.***> wrote:

Hi, thanks for your report. So I am less familiar with the SMARTS but here's what I've found:

— Reply to this email directly, view it on GitHub https://github.com/CDK-R/cdkr/issues/136#issuecomment-1426905952, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAIMOJR6AXJTU4XQJU5VATWXAUCTANCNFSM6AAAAAARQYH6EY . You are receiving this because you were mentioned.Message ID: @.***>

-- Rajarshi Guha | http://blog.rguha.net | @rguha https://twitter.com/rguha

YANGJJ93MS commented 1 year ago

Dear Rajarshi,

Thank you for your reply!

I found that the rdkit substructer matching fucntion did the same mistake. Please kindly find the picture below: figure-cdkit

As a matter of fact, the substructure that I am looking for is an benzene structure with an ammonia side chain, which is totally different from the naphthalene structure. I believed the reason is that the algorithm took the naphthalene structure as an alkyl structure.

Best regards, Junjie