chemosim-lab / ProLIF

Interaction Fingerprints for protein-ligand complexes and more
https://prolif.readthedocs.io
Apache License 2.0
371 stars 70 forks source link

Fixing ligand connectivity on the fly #202

Open asiomchen opened 7 months ago

asiomchen commented 7 months ago

I'm trying to calculate ligand-aminoacid interaction from OpenMM simulation from pdb topology and xtc trajectory. Ligand structure is show below image

But due to the use of pdb, MDAnalysis makes the bonds guess and it is not correct, leading to the structure below:

u = mda.Universe("mini_system.pdb", "mini_system.xtc")
ligand = u.select_atoms("resname LIG")
protein = u.select_atoms("not resname LIG")
protein.guess_bonds()
ligand.guess_bonds()
ligand_plf = plf.Molecule.from_mda(ligand)
plf.display_residues(ligand_plf, size=(400,400), mols_per_row=1)

image

As a result salt bridge between wrong oxygen and arginine is detected image

To fix that I was trying to just hardcode modification which is needed to fix bonding

def fix_bonds(ligand_mol):
    pattern = Chem.MolFromSmarts("[O-:1][C:2]=[C:3][C:4]=[O:5]")
    matches = ligand_mol.GetSubstructMatch(pattern)
    if matches:
        print("Fixing bonds")
        atom_1, atom_2, atom_3, atom_4, atom_5 = matches
        ligand_mol.GetBondBetweenAtoms(atom_1, atom_2).SetBondType(Chem.rdchem.BondType.DOUBLE)
        ligand_mol.GetBondBetweenAtoms(atom_2, atom_3).SetBondType(Chem.rdchem.BondType.SINGLE)
        ligand_mol.GetBondBetweenAtoms(atom_3, atom_4).SetBondType(Chem.rdchem.BondType.DOUBLE)
        ligand_mol.GetBondBetweenAtoms(atom_4, atom_5).SetBondType(Chem.rdchem.BondType.SINGLE)
        atom_1 = ligand_mol.GetAtomWithIdx(atom_1)
        atom_5 = ligand_mol.GetAtomWithIdx(atom_5)
        atom_1.SetFormalCharge(0)
        atom_5.SetFormalCharge(-1)
        Chem.Kekulize(ligand_mol, clearAromaticFlags=True)
    else:   
        print("No pattern found")
    return ligand_mol

This solution work well for visualization, and when used in Molecule like this:

class Molecule(plf.Molecule):
    @classmethod
    def from_mda(cls, *args, **kwargs):
        mol = super().from_mda(*args, **kwargs)
        mol = fix_bonds(mol)
        mol = cls.from_rdkit(mol)
        return mol
plf.Molecule = Molecule

This code produces correct visuals:

auto_fixed = Molecule.from_mda(ligand)
plf.display_residues(auto_fixed, size=(400,400), mols_per_row=1)

image

But when used in Fingerprint it still detect wrong interaction, but this time with the good molecule visualization

fp = plf.Fingerprint(interactions=["Anionic"])
fp.run(u.trajectory, ligand, protein, n_jobs=1, progress=False)
fp.plot_lignetwork(auto_fixed)

image

So my question is, @cbouy does more straightforward way to ensure good ligand bond orders exists?Or maybe you have some clues why my solution is not working? I believe that this could potentially be a good edition to prolif's functionality

Whole notebook and files are available as a issue.zip

cbouy commented 7 months ago

The following seems to work, the only caveat is that you'll need to run the analysis with n_jobs=1:

import prolif as plf
import MDAnalysis as mda
from rdkit import Chem
from rdkit.Chem import AllChem

u = mda.Universe("mini_system.pdb", "mini_system.xtc")
ligand = u.select_atoms("resname LIG")
protein = u.select_atoms("not resname LIG")
protein.guess_bonds()
ligand.guess_bonds()

lig_template = Chem.MolFromSmiles("Cc1ncnc(C(=O)N2CCCCC2)c1[O-]")
original_from_mda = plf.Molecule.from_mda

def from_mda(atomgroup, **kwargs):
    if atomgroup is ligand:
        raw_mol = atomgroup.convert_to.rdkit(NoImplicit=False, **kwargs)
        with plf.utils.catch_rdkit_logs():
            mol = AllChem.AssignBondOrdersFromTemplate(lig_template, raw_mol)
        return plf.Molecule(mol)
    return original_from_mda(atomgroup, **kwargs)

plf.Molecule.from_mda = from_mda

And you'll have to lower the threshold in the lignetwork plot to 0.2.

Not sure why your original solution doesn't work though...

Anyway, I've made a PR in MDAnalysis (https://github.com/MDAnalysis/mdanalysis/pull/4305) some time ago to allow bypassing the step that infers bond orders and charges and use your own template molecule as shown here, hopefully it can get reviewed soon so that we won't need this dirty workaround!