Closed forcefield closed 11 months ago
This will give you something similar to the old to_dataframe(return_atoms=True)
(assuming you're not doing protein-protein interactions):
import pandas as pd
data = []
index = []
for i, frame_ifp in fp.ifp.items():
index.append(i)
frame_data = {}
for (_, resid), residue_ifp in frame_ifp.items():
for int_name, metadata_tuple in residue_ifp.items():
frame_data[(str(resid), int_name)] = [
{
"ligand": metadata["indices"]["ligand"],
"protein": metadata["parent_indices"]["protein"],
}
for metadata in metadata_tuple
]
data.append(frame_data)
df = pd.DataFrame(data, index=pd.Index(index, name="Frame"))
df.columns = pd.MultiIndex.from_tuples(df.columns, names=["residue", "interaction"])
df
residue | VAL200.A | ALA216.A | ... | VAL210.A | PRO338.B |
---|---|---|---|---|---|
interaction | VdWContact | Hydrophobic | ... | Hydrophobic | VdWContact |
Frame | |||||
0 | [{'ligand': (35,), 'protein': (1978,)}] | [{'ligand': (0,), 'protein': (2222,)}] | ... | NaN | NaN |
... | ... | ... | ... | ... | ... |
240 | [{'ligand': (28,), 'protein': (1976,)}] | [{'ligand': (0,), 'protein': (2222,)}] | ... | NaN | NaN |
I've made that code snippet compatible with Fingerprint(count=True)
but if you're not interested in that you can replace
frame_data[(str(resid), int_name)] = [
{
"ligand": metadata["indices"]["ligand"],
"protein": metadata["parent_indices"]["protein"],
}
for metadata in metadata_tuple
]
with
metadata = metadata_tuple[0]
frame_data[(str(resid), int_name)] = {
"ligand": metadata["indices"]["ligand"],
"protein": metadata["parent_indices"]["protein"],
}
From there you can then use the pandas indexing commands at different levels to do what you need:
df.xs("PHE331.B", level="residue", axis=1).head(3)
Hydrophobic | PiStacking | VdWContact |
---|---|---|
[{'ligand': (0,), 'protein': (2668,)}] | [{'ligand': (3, 4, 5, 6, 7, 42), 'protein': (2... | [{'ligand': (6,), 'protein': (2669,)}] |
[{'ligand': (0,), 'protein': (2668,)}] | NaN | [{'ligand': (0,), 'protein': (2671,)}] |
[{'ligand': (0,), 'protein': (2668,)}] | NaN | NaN |
df.xs("HBDonor", level="interaction", axis=1).head(3)
ASP129.A | SER212.A | THR134.A |
---|---|---|
[{'ligand': (13, 52), 'protein': (1199,)}] | [{'ligand': (1, 44), 'protein': (2180,)}] | NaN |
[{'ligand': (13, 52), 'protein': (1199,)}] | [{'ligand': (1, 44), 'protein': (2180,)}] | NaN |
[{'ligand': (13, 52), 'protein': (1199,)}] | NaN | NaN |
In version 2.0.0, the return_atoms argument has been removed from Fingerprint.to_dataframe. But there does not seem to be an alternative interface to extract the atoms that participate in specific interactions. I had to write something like below:
Is there a better way to get the same information as the old Fingerprint.to_dataframe( return_atoms=True)?