Closed jparcon closed 4 months ago
Hi,
Not a direct way but this script should give what you seem to be looking for:
import pandas as pd
protein_mol = plf.Molecule.from_mda(protein_selection)
data = []
index = []
for i, frame_ifp in fp.ifp.items():
index.append(i)
frame_data = {}
for (ligres, protres), residue_ifp in frame_ifp.items():
for int_name, metadata_tuple in residue_ifp.items():
for metadata in metadata_tuple:
# extract atom name using atom index in metadata and protein mol
atoms = ",".join(
[
protein_mol.GetAtomWithIdx(x).GetMonomerInfo().GetName()
for x in metadata["parent_indices"]["protein"]
]
)
frame_data[(str(ligres), str(protres), int_name, atoms)] = 1
data.append(frame_data)
df = pd.DataFrame(data, index=pd.Index(index, name="Frame"))
df.columns = pd.MultiIndex.from_tuples(
df.columns, names=["ligand", "protein", "interaction", "atoms"]
)
df = df.fillna(0).astype(int)
df
It gives the dataframe below, where atoms
is the new header level containing the protein atoms involved in the corresponding interaction
ligand LIG1.G ...
protein TRP125.A VAL200.A THR209.A ... TYR38.A VAL200.A
interaction Hydrophobic VdWContact Hydrophobic ... Hydrophobic VdWContact
atoms CZ3 HG11 CG2 ... CD2 HB
Frame ...
0 1 1 1 ... 0 0
10 1 0 0 ... 0 0
... ... ... ... ... ... ...
230 0 0 0 ... 1 0
240 0 0 0 ... 0 1
This is so cool, thank you very much!
Now, when calculating the percentage of poses where each interaction is present, e.g. hydrophobic or ionic interactions become repeated for each atomic interactor (something that in principle would be only desirable for hydrogen bonds). So I made a slight change to get an additional data frame with merged interactors for interactions different from H bonds. I'm posting it in case someone finds it useful too.
# get atomic separated dataframe, only for H bonds
data = []
index = []
for i, frame_ifp in fp.ifp.items():
index.append(i)
frame_data = {}
for (ligres, protres), residue_ifp in frame_ifp.items():
for int_name, metadata_tuple in residue_ifp.items():
if int_name in ["HBAcceptor", "HBDonor"]:
for metadata in metadata_tuple:
# extract atom name using atom index in metadata and protein mol
atoms = ",".join(
[
protein_mol.GetAtomWithIdx(x).GetMonomerInfo().GetName()
for x in metadata["parent_indices"]["protein"]
]
)
frame_data[(str(protres), int_name, atoms.replace(" ", ""))] = 1
else:
frame_data[(str(protres), int_name, "X")] = 1
data.append(frame_data)
df_merge = pd.DataFrame(data, index=pd.Index(index, name="Pose"))
df_merge.columns = pd.MultiIndex.from_tuples(
df_merge.columns, names=["protein_res", "lig_interaction", "protein_ats"]
)
df_merge = df_merge.fillna(0).astype(int)
df_merge.sort_index(axis=1, level="protein_res", ascending=True, inplace=True)
df_merge.head(5)
Is there a direct way of getting the dataframe from the interaction fingerprint with the protein atomic interactors as headers?
When trying to define "important" interactions (let's say those repeated for many ligands), a column corresponding to a specific Thr for which a ligand shows "HB donor" could mean very different things if the interactor is the carbonyl from the backbone or the hydroxyl from the side chain.