datamol-io / datamol

Molecular Processing Made Easy.
https://docs.datamol.io
Apache License 2.0
452 stars 47 forks source link

A couple of problems with dm.viz.lasso_highlight_image #195

Closed PatWalters closed 1 year ago

PatWalters commented 1 year ago

I like the functionality of dm.viz.lasso_highlight_image, but there are a couple of things that could be improved.

1. Aromatic queries don't work.

If you try this

dm.viz.lasso_highlight_image("CC(N)Cc1c[nH]c2ccc3c(c12)CCCO3","c1ccccc1")

you'll get

2023-06-14 17:57:38.913 | WARNING  | datamol.viz._lasso_highlight:lasso_highlight_image:448 - no matching substructure found for c1ccccc1
2023-06-14 17:57:38.915 | WARNING  | datamol.viz._lasso_highlight:lasso_highlight_image:484 - No matches found for the given search molecules

This should work. I think the problem is that you're calling

mol = prepare_mol_for_drawing(target_molecule, kekulize=True)

before you do the matching. This converts the RDKit molecule to kekule form, then it won't match any aromatic queries. I think you could fix this by simply doing the matching before you call prepare_mol_for_drawing.

2. SMARTS don't work

If you do this

dm.viz.lasso_highlight_image("CC(N)Cc1c[nH]c2ccc3c(c12)CCCO3","[#6]")

you again get

2023-06-14 18:05:44.489 | WARNING  | datamol.viz._lasso_highlight:lasso_highlight_image:448 - no matching substructure found for [#6]
2023-06-14 18:05:44.490 | WARNING  | datamol.viz._lasso_highlight:lasso_highlight_image:484 - No matches found for the given search molecules

This appears to because you are are parsing the SMARTS with dm.to_mol which won't work for SMARTS.

smart_obj = dm.to_mol(smart_str)

This would work better if you did

smart_obj = Chem.MolFromSmarts(smart_str)

3. A couple of requests

It would be great to do lasso highlighting with a grid containing multiple images. It would be helpful to have a function that would accept a list of lists of atom indices to be highlighted in addition to the existing function that takes SMILES.

hadim commented 1 year ago

Thanks for the feedbacks. This feature is quite new and so not fully mature yet!

Thanks for 1) and 2), those are indeed two implementation mistakes.

As for 3), it would be nice to see those two features indeed and we already thought about those but didn't get the time to implement them yet!

hadim commented 1 year ago

This is fixed in #198

image

Thanks for reporting the bugs but also for digging into the code and debugging those issues. This is much appreciated!


It would be great to do lasso highlighting with a grid containing multiple images.

I won't be able to tackle that short term but will open an issue so we don't forget!

It would be helpful to have a function that would accept a list of lists of atom indices to be highlighted in addition to the existing function that takes SMILES.

In the same PR as above, you can now do:

image

PatWalters commented 1 year ago

This is fantastic, thank you!

hadim commented 1 year ago

For the record: https://github.com/datamol-io/datamol/issues/199