cdk / nodes4knime-cdk2

KNIMES nodes using the CDK library.
1 stars 2 forks source link

upgrade the SMARTS matching to the new API #3

Open egonw opened 2 years ago

egonw commented 2 years ago

Intermediate solution, use the legacy API.

webbres commented 1 year ago

I have redone (with as minimal changes to the code as possible) the SMARTS Query node.

SmartSMARTSQuery has been replaced with https://github.com/webbres/nodes4knime-cdk2/blob/cdk2-upgrade/org.openscience.cdk.knime/src/org/openscience/cdk/knime/nodes/smarts/MultipleSmartsMatcher.java

The SmartsWorker has been updated: https://github.com/webbres/nodes4knime-cdk2/blob/cdk2-upgrade/org.openscience.cdk.knime/src/org/openscience/cdk/knime/nodes/smarts/SmartsWorker.java

On top of switching to use the new API (I think?) I've reduced the number of matching calls from up to 3 to 1 - at least it looks to me like the old code does 1 match to find if there's a match, another match to get the unique counts and then another match to get the mappings if all column outputs are requested.

I'm not sure if I've done a like for like replacement though. The old code filters the mappings using SmartsStereoMatch and ComponentGrouping. I don't know if an equivalent is needed with the new API?

https://github.com/webbres/nodes4knime-cdk2/blob/cdk2-upgrade/org.openscience.cdk.knime/src/org/openscience/cdk/smiles/smarts/SmartSMARTSQueryTool.java

johnmay commented 1 year ago

The filters are now done automatically underneath the SMARTS pattern. In fact any query now is inspected to determine if they are needed or not:

https://github.com/cdk/cdk/blob/master/base/isomorphism/src/main/java/org/openscience/cdk/isomorphism/Pattern.java#L45-L74

johnmay commented 1 year ago

Not a bad idea to add some tests though,

C[C@H](CC)O = matches => C[C@H](CC)O
C[C@H](CC)O = matches => C[C@@H](O)CC
C[C@H](CC)O = should not match => C[C@H](O)CC
C[C@H](CC)O = should not match => C[C@@H](CC)O
webbres commented 1 year ago

Thanks John. I've added that as a workflow test and the matching is correct