chemosim-lab / ProLIF

Interaction Fingerprints for protein-ligand complexes and more
https://prolif.readthedocs.io
Apache License 2.0
336 stars 66 forks source link

Using compound names instead of pose number as X axis of interaction barcode. #167

Closed Le-Phung-Hien closed 7 months ago

Le-Phung-Hien commented 7 months ago

Hi,

I am running a script to get interaction barcode of different compounds (in a single sdf file) to a protein (in a pdb file).

The script run well, but I would like to know if there is a way to use compound names instead of pose number as X axis of interaction barcode? Also, is it possible to use 1-letter code instead of 3-letter code for the residues?

Thank you very much.

Hien

dockThor_28_in_final_best

import MDAnalysis as mda
import prolif as plf
import sys
import argparse
import os
from IPython.display import display
from matplotlib import pyplot as plt
from rdkit import Chem

# Create an argument parser
parser = argparse.ArgumentParser(description='Protein-ligand interaction barcoding')

# Add arguments
parser.add_argument('-p', '--protein', type=str, help='Path to the protein file')
parser.add_argument('-l', '--ligands', type=str, help='Path to the ligand file')
parser.add_argument('-o', '--output', type=str, help='Output image name')
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument("-md", "--mdanalysis", action="store_true", help='Using mdanalysis to parse protein')
group.add_argument("-rd", "--rdkit", action="store_true", help='Using rdkit to parse protein')
# Parse the command-line arguments
args = parser.parse_args()

protein = args.protein
ligands = args.ligands
output = args.output
# Read the protein file
if args.mdanalysis:
    u = mda.Universe(protein)
    protein_mol = plf.Molecule.from_mda(u)
elif args.rdkit: 
    rdkit_prot = Chem.MolFromPDBFile(protein, removeHs=False)
    protein_mol = plf.Molecule(rdkit_prot)
# Load ligands
poses_path = str(os.path.abspath(ligands))
if ligands.endswith('.sdf'):
    pose_iterable = plf.sdf_supplier(poses_path)
if ligands.endswith('.mol2'):
    pose_iterable = plf.mol2_supplier(poses_path)

# Use default interactions
fp = plf.Fingerprint()

# Run on your poses
fp.run_from_iterable(pose_iterable, protein_mol)

# show dataframe
df = fp.to_dataframe(index_col="Compound")
display(df)

# show barcode
fp.plot_barcode(xlabel="Pose")
plt.savefig(output, dpi=600)
plt.show()

# show first compound's interactions
fp.plot_lignetwork(pose_iterable[0])
cbouy commented 7 months ago

Hi Hien,

fp.plot_barcode(xlabel="Pose") returns a matplotlib Axes object from which you can manipulate the xticks and xticklabels attributes to do what you want (see their docs here).

As for switching the residue names to be 1-letter code, you can extract the current labels with the same attributes as above (just y instead of x), and if you extract the first 3 characters you can then use MDAnalysis to convert the 3-letter code to 1-letter with MDAnalysis.lib.util.inverse_aa_codes or your own conversion table.