forlilab / Meeko

Interface for AutoDock, molecule parameterization
https://meeko.readthedocs.io/
GNU Lesser General Public License v2.1
206 stars 49 forks source link

Covalent docking using autodock gpu with ligand prepared by `mk_prepare_ligand.py` #104

Open lovenicole opened 7 months ago

lovenicole commented 7 months ago

Hi, I want to do covalent docking using autodock gpu with ligand prepared by mk_prepare_ligand.py. However, I need help.

1) I followed the instruction below. https://github.com/ccsb-scripps/AutoDock-GPU/issues/93#issuecomment-1160961477 The --tether_smarts option input was "CCSCC(=O)N" and the --tether_smarts_indices was 1 2. However, I got an error like this.

warnings.warn("SMARTS pattern didn't match any atoms", RuntimeWarning)

2) Thus, I changed the "tether_smarts" to the only acrylamide pattern without S. I prepared the ligand using the command below. mk_prepare_ligand.py -i $id.sdf -o $id.pdbqt --receptor KRAS/8x6r_A_1.pdb --rec_residue "A:CYS:12" --tether_smarts "C=CC(=O)N" --tether_smarts_indices 1 2

And it works. Thus I did covalent docking with autodock gpu like the command below. autodock_gpu_128wi --lfile $id.pdbqt --ffile KRAS/8x6r_A_1.maps.fld --resnam $id --nrun 20

However, If I see the result dlg file, there is no atom pair with the distance below 1.9 (covalent docking).

I appreciate it in advance.

rwxayheee commented 7 months ago

Hi @lovenicole According to README, --tether_smarts_indices are used to specify the anchoring atoms, which need to be C-alpha and C-beta at the moment.

If "CCSCC(=O)N" does not return a match, maybe your ligand file doesn't contain these atoms because they are in the pre-reaction form (acrylamide)? The covalent ligand file (all $id.sdf) should be in the after-inhibition form and has to include C-alpha and C-beta. It would be helpful if you could post your input and output files so other people can reproduce the issue :>

lovenicole commented 7 months ago

Thank you very much for the reply.

Yes, I used pre-reaction form (just ligand in it).

How can I make after-inhibition form? I have an docked pdb file (with receptor and ligand in it). Can I use it?

rwxayheee commented 7 months ago

Hi @lovenicole You could edit the structures (add atoms, alter bond order) in software like PyMOL, as long as the generated ligand-residue conjugate has reasonable bond lengths and angles. If the ligand was docked non-covalently, the bond length would be longer and may not be a good starting structure for covalent docking. The starting conformations and dihedrals do not need to be perfect, though. I remember they don't need to align with the original receptor's sidechain, as they are subject to change in docking like other ligand torsions.

Alternatively, you could generate such conjugates from Smiles (after replacing one "C=CC(=O)N" by "CCSCC(=O)N"). There are programs that support conversion from Smiles to 3D structures. You can also do this in RDKit :>

lovenicole commented 7 months ago

Thank you very much. However, I need help determining which atoms or bond order needs to be added.

In CovalentBuilder, I got this error. image

I used the test.pdb file, made from autodock-gpu dlg file (without any covalent option). Because pdb file is not allowed to be attached in github, I changed the file extension to txt. test.txt

obabel -i pdb test.pdb -O test.sdf mk_prepare_ligand.py -i test.sdf -o test.pdbqt --receptor 8x6r_A_1.pdb --rec_residue "A:CYS:12" --tether_smarts "CCSCC(=O)N" --tether_smarts_indices 1 2

Could you help?

rwxayheee commented 7 months ago

Hi @lovenicole There are potentially two issues with the ligand pdb file (test.pdb). First, the format doesn't look like a standard PDB or PDBQT file. Also, there're two additional columns before atom ID. You can notice the difference when comparing the ligand part with the receptor part you provided in test.txt. They don't line up:

Screenshot 2024-04-30 at 12 52 15 AM

I can't determine what the the last two columns of ligand part are. The second last column add up to -10.05, so maybe a breakdown of scores? the last column add up to 0.05, are they partial charges? Would you please provide a little more details on how you generated this ligand PDB? I figured out now. The last two columns are the VdW and Elec columns in .dlg. They shouldn't be in a PDB file.

The second issue is the ligand PDB doesn't contain hydrogens and thus the SDF file has potentially incorrect valence. You can try adding the hydrogens with obabel. But if you have better sources of the ligand input files, which have specified valence, or contain hydrogens, they are better for mk_prepare_ligand.py. The valence info is important and ensures that the ligand & your smarts query are interpreted correctly.

Have you considered generating the ligand conformations and prepare ligands from Smiles? This is possible in RDKit and with meeko's python API. I can provide an example, if you're interested (tomorrow when i have a bit more time :>

rwxayheee commented 7 months ago
Have you considered generating the ligand conformations and prepare ligands from Smiles? This is possible in RDKit and with meeko's python API (?)

I did something similar in the past, but for very simple warheads like aldehydes, nitriles, etc. Below is how I edited some covalent ligand editing using RDKit, and I hope this might be helpful. The following (as a Python script, or simply dump the lines to a Jupyter notebook) generates an SDF file for the post-reaction, residue-covalent ligand conjugate from Smiles string of an acrylamide ligand, Osimertinib.

import rdkit
print("rdkit version:", rdkit.__version__)
from rdkit import Chem
from rdkit.Chem import AllChem

# Osimertinib
smi = "CN1C=C(C2=CC=CC=C21)C3=NC(=NC=C3)NC4=C(C=C(C(=C4)NC(=O)C=C)N(C)CCN(C)C)OC"
mol = Chem.MolFromSmiles(smi)

# Replace acrylamide by covalent conjugate
patt = Chem.MolFromSmiles('N(C(=O)C=C)')
repl = Chem.MolFromSmiles('N(C(=O)CCSCC)')
covmol = AllChem.ReplaceSubstructs(mol,patt,repl)

# Regenerate the covalent ligand from the edited Smiles
covsmi = Chem.MolToSmiles(covmol[0])
print("Smiles of the covalent ligand-residue conjugate:\n", covsmi)
covlig = Chem.MolFromSmiles(covsmi)

# Add hydrogens
covligH = Chem.AddHs(covlig)

# Get initial conformation
params = AllChem.ETKDGv3()
AllChem.EmbedMolecule(covligH, params = params)

# Minimization
covlig3d = Chem.Mol(covligH,confId=0)
prop = AllChem.MMFFGetMoleculeProperties(covlig3d)
ff = AllChem.MMFFGetMoleculeForceField(covlig3d, prop)
e_0 = ff.CalcEnergy()
print("MMFF energy before minimization:", e_0)
ff.Minimize()
e_min = ff.CalcEnergy()
print("MMFF energy after minimization:", e_min)

# Write an SDF file
with Chem.SDWriter('covlig.sdf') as w:
    w.write(covlig3d)

Output:

rdkit version: 2023.09.4
Smiles of the covalent ligand-residue conjugate:
 CCSCCC(=O)Nc1cc(Nc2nccc(-c3cn(C)c4ccccc34)n2)c(OC)cc1N(C)CCN(C)C
MMFF energy before minimization: 95.122853461888
MMFF energy after minimization: 26.65779006425961

I'm unfamiliar with the API usage for covalent ligand preparation >.< the dev and other experts might be able to help us from here. Two more things I wanted to note:

  1. The --tether_smarts should be "CCSCCC(=O)N" mk_prepare_ligand.py -i covlig.sdf -o covlig.pdbqt --receptor test_receptor.pdb --rec_residue "A:CYS:12" --tether_smarts "CCSCCC(=O)N" --tether_smarts_indices 1 2

  2. Keep an eye on your post-reaction conjugates for potential stereoisomers because some reactions create new chiral centers. The stereoisomers might need to be considered different ligands for docking

lovenicole commented 7 months ago

Thank you very much for the detailed protocol.

There is no whole receptor structure in the rdkit part code you offered. So I just need to replace the acrylamide pattern with 'N(C(=O)CCSCC).' Is this the standard procedure, or do I need the whole receptor structure also in the rdkit part in the actual practice?

I really really appreciate your detailed description.

rwxayheee commented 7 months ago

There is no whole receptor structure in the rdkit part code you offered.

I used the receptor part of test.txt you provided. Here's my test_receptor.pdb: test_receptor.pdb.txt

So I just need to replace the acrylamide pattern with 'N(C(=O)CCSCC).'

No, in my scripts I used the very generic smarts to find the acrylamide and attach the covalent conjugates. The replacement step should work for all acrylamides, not limiting to Osimertinib. YES, I misinterpreted what you said, but I understand now. Yes, it's basically just replacing acrylamide with the covalent conjugate.

To customize Python scripts for your own acrylamide compounds, modify only the value of smi. Provide the Smiles of your acrylamide ligands as smi :)

# Osimertinib
smi = "CN1C=C(C2=CC=CC=C21)C3=NC(=NC=C3)NC4=C(C=C(C(=C4)NC(=O)C=C)N(C)CCN(C)C)OC"

As an alternate to ReplaceSubstructs, you could just provide the Smiles of the conjugate form, if your ligands are collected in a certain pattern (Like they are a family of mostly similar structures with some different R groups/linkers that are arranged in predictable positions.)

Is this the standard procedure, or do I need the whole receptor structure also in the rdkit part in the actual practice?

No the RDKit part is just my procedure. I used this procedure for a similar project. The RDKit part is independent from the receptor structure. When preparing the covalent conjugate in RDKit, the only thing we need to know is what type of residue is the target residue (Cys in this case).

For mk_prepare_ligand.py, I think you are right that the receptor pdb doesn't need to be the entire receptor. I reduce the pdb file to atoms that belong to CYS 12, and it still generates the same ligand.pdbqt file. The purpose of providing the receptor structure is to give the coordinates of C-alpha and C-beta. Their coordinates are used as anchors and will not change in docking. You will see that the initial coordinates of the ligand in covlig.pdbqt now align with the residue in the receptor file.

Hope this helps!

lovenicole commented 7 months ago

Thank you! I succeeded thanks to you.

I have one last question. I can't see any binding energy in standard output log or pdbqt file. If I want to get binding energy encompassing this covalent binding and noncovalent binding, does it work to put the covlig.pdbqt file in autodock-gpu?

I'm confused because if I do that way, I don't think the covalent binding energy is considered (the output dlg file below). image

rwxayheee commented 7 months ago

Hi @lovenicole I'm unfamiliar with how the energy is computed in AutoDock-GPU for covalent docking. This "modeling as flexible residue" method is discussed in this paper, but I'm not very sure about the definition of the energies without looking at the codes. In my understanding, the output provides the interaction energy between the receptor and ligand, and C-alpha and C-beta are excluded from intermolecular energy. However I'm not 100% sure if they still contribute to internal energy. In different programs, there are different ways to implement and score the covalent docking poses. It would be nice if we can get help from the devs and you could also make a new post in AutoDock-GPU and describe a little bit what you expect to get, let us know how would you like the covalent binding energy to be defined

lovenicole commented 7 months ago

I think I need to make a new post in AutoDock-GPU about it.

Thank you very much! Your instruction really helped me a lot. Hope you have a nice day :)

Alan-Abdallah commented 3 months ago

you can also use ReactionFromSmarts from rdkit.

from rdkit import Chem
from rdkit.Chem import AllChem,rdDistGeom,rdChemReactions
import pandas as pd 
import argparse
from tqdm import tqdm
from pathlib import Path

parser = argparse.ArgumentParser()
parser.add_argument('-c','--csv', type=str, help='input csv file')
args = parser.parse_args()
csv_input = args.csv
compounds = pd.read_csv(csv_input)
output_name = csv_input.split('.')[0]
output_dir = Path(output_name)
output_dir.mkdir()

#other reactions can also be added here
reactions={}
#reactions['nucleophilic_substitution'] = rdChemReactions.ReactionFromSmarts('[c:1]-[F,Br,Cl,I].[S:2]-[CH2:3]-[CH3:4]>>[c:1]-[S:2]-[CH2:3]-[CH3:4]')
#reactions['nucleophilic_substitution'].Initialize()
reactions['michael_addition'] = rdChemReactions.ReactionFromSmarts('[CH1,CH2:1]=[C:2]-[C,S:3](=[O:4]).[S:5]-[CH2:6]-[CH3:7]>>[CH3:7]-[CH2:6]-[S:5]-[C:1]-[C:2]-[C,S:3](=[O:4])')
reactions['michael_addition'].Initialize()

cys = Chem.MolFromSmiles('CCS')
norm_compounds = {'name': [], 'smiles': [], 'reaction': [], 'mol': []}
cov_compounds = {'name': [], 'smiles': [], 'reaction': [], 'mol': []}

for _, row in tqdm(compounds.iterrows(), total=len(compounds), desc='Processing SMILES'):
    mol = Chem.MolFromSmiles(row['SMILES'])
    mol_name = row['NAME']
    mol.SetProp('_Name',mol_name)

    for rxn_name, rxn in reactions.items():    
        products = rxn.RunReactants((mol, cys))
        if len(products) > 0:

            norm_compounds['name'].append(mol_name)
            norm_compounds['smiles'].append(Chem.MolToSmiles(mol))
            norm_compounds['reaction'].append(rxn_name)

            mol = Chem.AddHs(mol)
            rdDistGeom.EmbedMultipleConfs(mol,numThreads=0,numConfs=3)
            AllChem.MMFFOptimizeMolecule(mol, maxIters=500)
            norm_compounds['mol'].append(mol)

            for i,product in enumerate(products):
                cov_mol = Chem.MolFromSmiles(Chem.MolToSmiles(product[0]))
                cov_name = mol_name+'_'+chr(65+i)
                cov_mol.SetProp('_Name',cov_name)

                cov_compounds['name'].append(cov_name)
                cov_compounds['smiles'].append(Chem.MolToSmiles(cov_mol))
                cov_compounds['reaction'].append(rxn_name)

                cov_mol = Chem.AddHs(cov_mol)
                rdDistGeom.EmbedMultipleConfs(cov_mol,numThreads=0,numConfs=3)
                AllChem.MMFFOptimizeMolecule(cov_mol, maxIters=500)
                cov_compounds['mol'].append(cov_mol)

with Chem.SDWriter(f'{output_dir/output_name}_normal.sdf') as w:  
    for m in tqdm(norm_compounds['mol'], desc='Writing normal compounds'):
        w.write(m)
with Chem.SDWriter(f'{output_dir/output_name}_covalent.sdf') as w:
    for m in tqdm(cov_compounds['mol'], desc='Writing covalent compounds'):
        w.write(m)

This script takes in a csv (NAME, SMILES) and outputs all unique mols before and after the specified reactions. e.g. compounds.csv > compounds_normal.sdf + compounds_covalent.sdf

compounds_covalent.sdf can then be used with meeko to get the pdbqt-files for docking: mk_prepare_ligand.py -i compounds_covalent.sdf --receptor receptor.pdb --rec_residue "A:CYS:XXX" --tether_smarts '[CH3][CH2][S][C,c]' --tether_smarts_indices 1 2 --multimol_outdir {output_dir}

Hope this helps someone :)