chemosim-lab / ProLIF

Interaction Fingerprints for protein-ligand complexes and more
https://prolif.readthedocs.io
Apache License 2.0
369 stars 70 forks source link

Details about cationic and anionic interactions #24

Open soumyosen opened 3 years ago

soumyosen commented 3 years ago

Hi,

trial trial1 I have attached here 2 images. In the first image, interactions are not considered as cation-anion interactions according to prolif analysis. But in the second image, it is considered. In the prolif document, it was mentioned about 4.5 A distance cutoff between +1 and -1 for a charged interaction. In my case, which atoms are considered as the position of +1 and -1. I run the job using just pdb files and run atoms.guess_bonds, followed by selection and finger print generation.

Thank you, Soumyo

cbouy commented 3 years ago

Hi @soumyosen,

You can retrieve which atoms are responsible for each interaction with fp.to_dataframe(return_atoms=True), and the resulting dataframe will contain, for each interaction between a pair of ligand and protein residues, the index of the ligand and protein atoms in these residues.

Now this is best viewed directly on a structure, and for this you can reuse the script displayed in this section of the documentation. You will need to install py3Dmol: pip install py3Dmol And then something like this should work in a Jupyter notebook:

import MDAnalysis as mda
import prolif as plf
import numpy as np
from rdkit import Chem
from rdkit import Geometry
import py3Dmol

# load topology and create selections
u = mda.Universe("input.pdb")
lig = u.select_atoms("resname LIG")
prot = u.select_atoms("protein")

colors = {
    "Cationic": "green",
    "Anionic": "purple",
}

# get lig-prot interactions with atom info
fp = plf.Fingerprint(["Cationic", "Anionic"])
fp.run(u.trajectory, lig, prot)
df = fp.to_dataframe(return_atoms=True)

#### Display interactions ####
# the section below should work without any modification

lmol = plf.Molecule.from_mda(lig)
pmol = plf.Molecule.from_mda(prot)

# JavaScript functions
resid_hover = """function(atom,viewer) {{
    if(!atom.label) {{
        atom.label = viewer.addLabel('{0}:'+atom.atom+atom.serial,
            {{position: atom, backgroundColor: 'mintcream', fontColor:'black'}});
    }}
}}"""
hover_func = """
function(atom,viewer) {
    if(!atom.label) {
        atom.label = viewer.addLabel(atom.interaction,
            {position: atom, backgroundColor: 'black', fontColor:'white'});
    }
}"""
unhover_func = """
function(atom,viewer) {
    if(atom.label) {
        viewer.removeLabel(atom.label);
        delete atom.label;
    }
}"""

v = py3Dmol.view(650, 600)
v.removeAllModels()

models = {}
mid = -1
for i, row in df.T.iterrows():
    lresid, presid, interaction = i
    lindex, pindex = row[0]
    lres = lmol[lresid]
    pres = pmol[presid]
    # set model ids for reusing later
    for resid, res, style in [(lresid, lres, {"colorscheme": "cyanCarbon"}),
                              (presid, pres, {})]:
        if resid not in models.keys():
            mid += 1
            v.addModel(Chem.MolToMolBlock(res), "sdf")
            model = v.getModel()
            model.setStyle({}, {"stick": style})
            # add residue label
            model.setHoverable({}, True, resid_hover.format(resid), unhover_func)
            models[resid] = mid
    # get coordinates for both points of the interaction
    p1 = lres.GetConformer().GetAtomPosition(lindex)
    p2 = pres.GetConformer().GetAtomPosition(pindex)
    # add interaction line
    v.addCylinder({"start": dict(x=p1.x, y=p1.y, z=p1.z),
                   "end":   dict(x=p2.x, y=p2.y, z=p2.z),
                   "color": colors[interaction],
                   "radius": .15,
                   "dashed": True,
                   "fromCap": 1,
                   "toCap": 1,
                  })
    # add label when hovering the middle of the dashed line by adding a dummy atom
    c = Geometry.Point3D(*plf.utils.get_centroid([p1, p2]))
    modelID = models[lresid]
    model = v.getModel(modelID)
    model.addAtoms([{"elem": 'Z',
                     "x": c.x, "y": c.y, "z": c.z,
                     "interaction": interaction}])
    model.setStyle({"interaction": interaction}, {"clicksphere": {"radius": .5}})
    model.setHoverable(
        {"interaction": interaction}, True,
        hover_func, unhover_func)

# show protein:
# first we need to reorder atoms as in the original MDAnalysis file.
# needed because the RDKitConverter reorders them when infering bond order
# and 3Dmol.js doesn't like when atoms from the same residue are spread accross the whole file
order = np.argsort([atom.GetIntProp("_MDAnalysis_index") for atom in pmol.GetAtoms()])
mol = Chem.RenumberAtoms(pmol, order.astype(int).tolist())
mol = Chem.RemoveAllHs(mol)
pdb = Chem.MolToPDBBlock(mol, flavor=0x20 | 0x10)
v.addModel(pdb, "pdb")
model = v.getModel()
model.setStyle({}, {"cartoon": {"style":"edged"}})
v.zoomTo({"model": list(models.values())})

If you prefer, it should also be possible to convert this information to a list of pymol selections/commands if you prefer using that software. Don't hesitate to ask if you need some help with that!

soumyosen commented 3 years ago

Thank you for the script, it works perfectly. So for the first image where prolif shows there is no charged interactions, the distance in N+ and O- is 3.1 A. Is there a chance that prolif measures distance between N+ and the other oxygen of the carboxylate group? Because that distance is 5.0 A.

cbouy commented 3 years ago

My bad, the script I gave is not going to help in the situation where you want to debug an interaction that should be detected but is not, sorry!

Is there a chance that prolif measures distance between N+ and the other oxygen of the carboxylate group?

Because you are using a PDB file it's possible that the choices made by PyMOL when assigning charges/bond orders to atoms of the carboxylate and guanidinium moieties are different from ProLIF. Technically all of the resonance structures are valid, but usually there's only one that is relevant for non-bonded interactions, and it might not have been picked up by ProLIF. To view both of your residues (and compare them with pymol), you could use this script :

import MDAnalysis as mda
import prolif as plf
import numpy as np
from rdkit import Chem
import py3Dmol

# load topology and create selections
u = mda.Universe("input.pdb")
lig = u.select_atoms("resname LIG")
prot = u.select_atoms("protein")
prot_res = "ARG42.A"  # change this to your residue of interest
v = py3Dmol.view(650, 600)

# display ligand and residue
v.removeAllModels()
lmol = plf.Molecule.from_mda(lig)
pmol = plf.Molecule.from_mda(prot)
pres = pmol[prot_res]
for resid, res, style in [(lmol, {"colorscheme": "cyanCarbon"}),
                          (pres, {})]:
        v.addModel(Chem.MolToMolBlock(res), "sdf")
        model = v.getModel()
        model.setStyle({}, {"stick": style})

# show protein:
order = np.argsort([atom.GetIntProp("_MDAnalysis_index") for atom in pmol.GetAtoms()])
mol = Chem.RenumberAtoms(pmol, order.astype(int).tolist())
mol = Chem.RemoveAllHs(mol)
pdb = Chem.MolToPDBBlock(mol, flavor=0x20 | 0x10)
v.addModel(pdb, "pdb")
model = v.getModel()
model.setStyle({}, {"cartoon": {"style":"edged"}})
v.zoomTo({"model": [0, 1]})

Because the interactions are not detected despite a distance between N+ and O- below the threshold, I'm guessing the charge assignment will differ between prolif and pymol.

soumyosen commented 3 years ago

Yes, I am also guessing the same that the charge assignment differs between prolif and pymol. The charge assignment in pymol matches with pdb N1+ or O1-. But I think prolif is not matching with that.

rwxayheee commented 3 years ago

Hello Soumyo and Cédric,

I'm a new user of ProLIF and I have a very similar question to this. I wonder how the formal charge is handled in ProLIF for residue/ligands with resonance structures like ARG or carboxylate groups. If my understanding is correct, ProLIF uses SMARTS strings to describe molecules, in which the atomic formal charge of a residue or ligand is explicitly indicated. Such charge assignment is done by the RDKit module in ProLIF. There're ways to override the charge assignment for ligand molecules in RDKit. For residue-residue interactions like the interaction between ARG and GLU, maybe we could define the interaction in a different way (by the atom type and distance only). What do you guys think?

Best, Amy

cbouy commented 3 years ago

The charge assignment in pymol matches with pdb N1+ or O1-. But I think prolif is not matching with that.

That's very likely, MDAnalysis doesn't keep the information about formal charges on atoms, so that information is lost.

I wonder how the formal charge is handled in ProLIF for residue/ligands with resonance structures like ARG or carboxylate groups. [...] There're ways to override the charge assignment for ligand molecules in RDKit.

The answer is unfortunately very simple: we don't handle multiple resonance structures. However RDKit has a function that enumerates all possible resonance structures: Chem.ResonanceMolSupplier. With that you could try to create a fingerprint for each resonance structure and then concatenate them.

import MDAnalysis as mda
import prolif
import numpy as np
from rdkit import Chem

u = mda.Universe("complex.pdb")
u.atoms.guess_bonds()
fp = prolif.Fingerprint()

# generate fp for each combination of resonance structures
i = 0
ifp = []
for lig_mol in Chem.ResonanceMolSupplier(u.select_atoms("resname LIG").convert_to("RDKIT")):
    lig = prolif.Molecule(lig_mol)
    for prot_mol in Chem.ResonanceMolSupplier(u.select_atoms("protein").convert_to("RDKIT")):
        prot = prolif.Molecule(prot_mol)
        data = fp.generate(lig, prot, return_atoms=True)
        data["Frame"] = i
        i += 1
        ifp.append(data)

df = prolif.to_dataframe(ifp, fp.interactions.keys())

# concatenate
df2 = (df
       .sum(axis=0)
       .astype(bool)
       .to_frame()
       .T)

If my understanding is correct, ProLIF uses SMARTS strings to describe molecules, in which the atomic formal charge of a residue or ligand is explicitly indicated.

ProLIF uses SMARTS strings to describe interactions, not molecules. The SMARTS strings are used to run substructure searches on each "residue" inside your protein and ligand, and then computes distances/angles when there are matches. Inside a SMARTS string you can indicate many informations including elements, charges, aromaticity, ring membership, neighbor atoms and their bonds, and with this you can very precisely define the nature of an interaction. For a full list of SMARTS features see this DAYLIGHT page and the RDKit implementation.

Such charge assignment is done by the RDKit module in ProLIF.

Charge and bond order assignment for PDB and similar files is done by MDAnalysis: when you use the prolif.Molecule.from_mda or prolif.Fingerprint.run functions, the input MDAnalysis universe is converted to an RDKit mol by MDAnalysis (it's usually the most time consuming step), then the RDKit mol is quickly transformed in a ProLIF molecule (which behaves exactly like an RDKit molecule, except that you can also easily access each residue as an individual molecule).

For residue-residue interactions like the interaction between ARG and GLU, maybe we could define the interaction in a different way (by the atom type and distance only). What do you guys think?

Yes you could also handle this with a custom interaction. The SMARTS patterns used would need to match the guanidinium moiety of ARG and the carboxylate of GLU, but instead of measuring the distance between the (+) and (-) charges, you could also include the distances between the neutral oxygen and nitrogen atoms that matched those SMARTS patterns, and if one of those distances is below the threshold then they are "interacting". There are some pointers on how to define a ProLIF interaction here.

rwxayheee commented 3 years ago

Hi Cédric, Thank you so much for your kind reply! That is super informative and helpful. Yes I'm using the Chem.ResonanceMolSupplier method to generate possible resonance structures, sanitize and override the default by the desired resonance forms. I'm also a beginner of MDA and RDKit and I learned a lot from the script you wrote. Thanks again for your help! Amy

soumyosen commented 3 years ago

Hi Cedric, This whole description is very helpful. Thank you, Soumyo