cbouy / mols2grid

Interactive molecule viewer for 2D structures
https://mols2grid.readthedocs.io
Apache License 2.0
206 stars 25 forks source link

Displaying highlights using mols2grid #42

Closed andresilvapimentel closed 1 year ago

andresilvapimentel commented 1 year ago

I was trying to display fragment highlights of a molecules set using mols2grid in a pandas dataframe, but I only got to show the molecules set without the fragments highlights. Please, see the piece of code below:

fragment_list = [] for id in active_id_list: exp = explainer.explain_instance(test_dataset.X[id], model_fn, num_features=100, top_labels=1) key = list(exp.as_map().keys())[0] my_fragments = fp_mol(Chem.MolFromSmiles(test_dataset.ids[id])) fragment_weight = dict(exp.as_map()[key]) for index in my_fragments: if index in fragment_weight: m = Chem.MolFromSmiles(test_dataset.ids[id]) substructure = Chem.MolFromSmarts(list(my_fragments[index])[0]) m.GetSubstructMatches(substructure) fragment_list.append({'id': id, 'Smiles': test_dataset.ids[id], 'p': key, 'index': index, 'fragments': my_fragments[index], 'weight': fragment_weight[index], 'Highlights': m}) df1 = pd.DataFrame(fragment_list) df1 mols2grid.display(df1, mol_col="Highlights")

Am I doing something wrong? How can I do it? Thanks.

cbouy commented 1 year ago

Hi @andresilvapimentel,

To generate the depictions using the molecule object directly, you need to add prerender=True to mols2grid.display.

By default mols2grid internally converts the molecule to a SMILES string and then uses the JavaScript version of RDKit to create the molecule depictions as needed. It uses a lot less memory as the images for all molecules don't need to be pre-generated, but it also means that it loses all other internal info on the molecule object hence your issue here.

Please close the issue if that solves your problem, and don't hesitate if you have other questions!

Cedric

andresilvapimentel commented 1 year ago

Hi @cbouy Thanks!!! It worked nicely!!!

I have another question before closing the issue.

Is it possible to display the images of mols and fingerprints side by side (similar to pandas dataframe) using the following code? It is a <PIL.PngImagePlugin.PngImageFile image that does not work pretty well in the pandas dataframe in google colab.

 prints = [(m, x, bi) for x in list(result)]
 figure = Draw.DrawMorganBits(prints, molsPerRow = 4, legends = [str(x) for x in list(result)])
 figure.save("figure"+str(id)+".png","png")
 figure_list.append("figure"+str(id)+".png")
 smiles_list.append(test_dataset.ids[id])

df = pd.DataFrame({'smiles': smiles_list}) PandasTools.AddMoleculeColumnToFrame(df,'smiles','Molecule') df['Fragments'] = figure_list df mols2grid.display(df, smiles_col="Fragments", prerender = True)

I tried and got the following error message:

KeyError Traceback (most recent call last) /usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3360 try: -> 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err:

16 frames pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'mols2grid-tooltip'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last) KeyError: 'mols2grid-tooltip'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) /usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py in check_ndim(values, placement, ndim) 1977 ) 1978 if len(placement) != len(values): -> 1979 raise ValueError( 1980 f"Wrong number of items passed {len(values)}, " 1981 f"placement implies {len(placement)}"

ValueError: Wrong number of items passed 6, placement implies 1

cbouy commented 1 year ago

Side by side is not possible but you can do one below the other (although I doubt that it will be readable). Here's an example to add a scaffold image so just replace it with your code to draw the fingerprints (make sure to use useSVG=True):

from rdkit.Chem import Draw
from rdkit.Chem.Scaffolds import MurckoScaffold

def scaffold_svg(mol):
  scaffold = MurckoScaffold.GetScaffoldForMol(mol)
  d = Draw.MolDraw2DSVG(160, 120)
  d.DrawMolecule(scaffold)
  d.FinishDrawing()
  return d.GetDrawingText()

df["scaffold"] = df["mol"].apply(scaffold_svg)
mols2grid.display(df, mol_col="mol", subset=["mols2grid-id", "img", "scaffold"])

Another option would be to use a custom Python callback to display the fingerprint image below the grid.

andresilvapimentel commented 1 year ago

Nice!!! Thanks!!!