chemosim-lab / ProLIF

Interaction Fingerprints for protein-ligand complexes and more
https://prolif.readthedocs.io
Apache License 2.0
361 stars 68 forks source link

Interface to retrieve atoms in interactions #164

Closed forcefield closed 11 months ago

forcefield commented 11 months ago

In version 2.0.0, the return_atoms argument has been removed from Fingerprint.to_dataframe. But there does not seem to be an alternative interface to extract the atoms that participate in specific interactions. I had to write something like below:

def interactions_of_type( fp, residues, itypes=None):
    """
    Extract the ligand and protein atoms forming interactions of specific types.

    fp: prolif.fingerprint.Fingerprint

    residues: list of residue specifications

    itypes: list of interaction types, such as "VdWContact", "HBAcceptor", etc.

    Returns
    -------

    interactions: a list of interactions in each frame, represented by a dictionary:
       { interaction_type: [ ( <tuple of ligand atoms>, ( <tuple of protein atoms> )) ] }
    """

    interactions = []
    if itypes is None:
        itypes = fp.interactions.keys()
    for frame in fp.ifp:
        fifp = fp.ifp[frame]
        interactions.append( { i: [] for i in itypes })
        for res in residues:
            rifp = fifp[res]
            if len(rifp) == 0: continue
            # This is the an awful and awkward way to access the data in rifp, which is an IFP object
            # of a single item but does not provide a function to get the keys or the values.
            rifp = [ rifp[rid] for rid in rifp ][0]
            for i in itypes:
                ints = rifp.get( i, [])
                ints = [ (i['indices']['ligand'], i['parent_indices']['protein'])
                         for i in ints ]
                interactions[-1][i].extend( ints)
    return interactions

Is there a better way to get the same information as the old Fingerprint.to_dataframe( return_atoms=True)?

cbouy commented 11 months ago

This will give you something similar to the old to_dataframe(return_atoms=True) (assuming you're not doing protein-protein interactions):

import pandas as pd

data = []
index = []
for i, frame_ifp in fp.ifp.items():
    index.append(i)
    frame_data = {}
    for (_, resid), residue_ifp in frame_ifp.items():
        for int_name, metadata_tuple in residue_ifp.items():
            frame_data[(str(resid), int_name)] = [
                {
                    "ligand": metadata["indices"]["ligand"],
                    "protein": metadata["parent_indices"]["protein"],
                }
                for metadata in metadata_tuple
            ]
    data.append(frame_data)

df = pd.DataFrame(data, index=pd.Index(index, name="Frame"))
df.columns = pd.MultiIndex.from_tuples(df.columns, names=["residue", "interaction"])
df
residue VAL200.A ALA216.A ... VAL210.A PRO338.B
interaction VdWContact Hydrophobic ... Hydrophobic VdWContact
Frame
0 [{'ligand': (35,), 'protein': (1978,)}] [{'ligand': (0,), 'protein': (2222,)}] ... NaN NaN
... ... ... ... ... ...
240 [{'ligand': (28,), 'protein': (1976,)}] [{'ligand': (0,), 'protein': (2222,)}] ... NaN NaN

I've made that code snippet compatible with Fingerprint(count=True) but if you're not interested in that you can replace

            frame_data[(str(resid), int_name)] = [
                {
                    "ligand": metadata["indices"]["ligand"],
                    "protein": metadata["parent_indices"]["protein"],
                }
                for metadata in metadata_tuple
            ]

with

            metadata = metadata_tuple[0]
            frame_data[(str(resid), int_name)] = {
                "ligand": metadata["indices"]["ligand"],
                "protein": metadata["parent_indices"]["protein"],
            }

From there you can then use the pandas indexing commands at different levels to do what you need:

df.xs("PHE331.B", level="residue", axis=1).head(3)
Hydrophobic PiStacking VdWContact
[{'ligand': (0,), 'protein': (2668,)}] [{'ligand': (3, 4, 5, 6, 7, 42), 'protein': (2... [{'ligand': (6,), 'protein': (2669,)}]
[{'ligand': (0,), 'protein': (2668,)}] NaN [{'ligand': (0,), 'protein': (2671,)}]
[{'ligand': (0,), 'protein': (2668,)}] NaN NaN
df.xs("HBDonor", level="interaction", axis=1).head(3)
ASP129.A SER212.A THR134.A
[{'ligand': (13, 52), 'protein': (1199,)}] [{'ligand': (1, 44), 'protein': (2180,)}] NaN
[{'ligand': (13, 52), 'protein': (1199,)}] [{'ligand': (1, 44), 'protein': (2180,)}] NaN
[{'ligand': (13, 52), 'protein': (1199,)}] NaN NaN