Closed Le-Phung-Hien closed 10 months ago
Hello,
The --original_input
option is going to be removed, it works only if the ligand was prepared with an earlier version of meeko or explicity requesting openbabel as the chemoinformatics backend. It requires the user to keep track of the original input files and also a remark with an index mapping that is not included in the PDBQT you posted.
Unfortunately, mk_export.py
will not work on that PDBQT.
Is gnina removing the remarks in the input PDBQT file?
That what I get after running gnina. Actually I run gnina with the openbabel-generated pdbqt files, which doesnt have SMILES remark also...
REMARK 13 active torsions: REMARK status: ('A' for Active; 'I' for Inactive) REMARK 1 A between atoms: C_2 and C_3 REMARK 2 A between atoms: C_3 and C_4 REMARK 3 A between atoms: C_3 and C_39 REMARK 4 A between atoms: C_4 and C_5 REMARK 5 A between atoms: C_5 and C_6 REMARK 6 A between atoms: C_6 and C_8 REMARK 7 A between atoms: C_17 and O_18 REMARK 8 A between atoms: O_18 and C_19 REMARK 9 A between atoms: C_21 and C_22 REMARK 10 A between atoms: C_22 and O_23 REMARK 11 A between atoms: C_24 and O_25 REMARK 12 A between atoms: C_26 and O_27 REMARK 13 A between atoms: C_28 and O_29 ROOT HETATM 1 C UNK N 1 -2.531 0.189 -2.979 0.00 0.00 -0.002 C
I will try to make ligand with meeko to see if there will be differences.
Ah, openbabel does not add such remarks to the ligand PDBQT. Only meeko does.
I will try to make ligand with meeko
This should fix it!
Hi @Le-Phung-Hien,
User here, was in a similar situation and had to reindex the pdbqt atoms in a certain way for other types of calculations.. If you are not dealing with extra-large ligands, a direct sorting (ascending x, ascending y, ascending z) on both your input ligand pdbqt and meeko's pdbqt (as a template) could fix this.
The index mapping is only needed once per ligand. Just add the edited REMARK SMILES, REMARK SMILES IDX and REMARK H PARENT to the output pdbqt file. That way mk_export.py might be able to process the docking outputs you already have.
We built some similar index fixing tool before starting to use Meeko (thanks @diogomart again for the great tool!!). May not be the most efficient solution but something like this seemed to generate modified REMARKS that works for mk_export.py:
import numpy as np
class pdbqt_atom:
def __init__(self, atom_id, atom_coord):
"""
atom_id: int, as appeared in pdbqt file
atom_coord: ordered iterable of floats x, y, z
"""
self.atom_id = int(atom_id)
self.atom_coord = atom_coord
def parse_pdbqt(pdbqt_filename):
atom_list = []
with open(pdbqt_filename, "r") as f:
line = f.readline()
while line:
if line[:6] in ["ATOM ", "HETATM"]:
new_atom = pdbqt_atom(line[7:11], [float(line[30:38]), float(line[38:46]), float(line[46:54])])
atom_list.append(new_atom)
line = f.readline()
return atom_list
def direct_sort(input_list):
dtype = [('atom_id', 'int'), ('coord_x', 'float'), ('coord_y', 'float'), ('coord_z', 'float')]
structured_list = []
for atom in input_list:
structured_list.append((atom.atom_id, atom.atom_coord[0], atom.atom_coord[1], atom.atom_coord[2]))
structured_array = np.array(structured_list, dtype=dtype)
sorted_array = np.sort(structured_array, order=['coord_x', 'coord_y', 'coord_z'], kind="mergsort")
return sorted_array
def new_remarks(meeko_pdbqt_file, lookup_table):
remark_list = []
with open(meeko_pdbqt_file, "r") as f:
line = f.readline()
while line[:6]=="REMARK":
for remark_type in ["REMARK SMILES IDX ", "REMARK H PARENT "]:
if remark_type in line:
meeko_idx_mapping = [int(x) for x in line.replace(remark_type,"").split()]
n_idx = int(len(meeko_idx_mapping)/2)
smiles_ids, meeko_ids = np.reshape(meeko_idx_mapping, (n_idx, 2)).T
new_ids = []
for old_id in meeko_ids:
new_ids.append(lookup_table[old_id])
new_idx_mapping = [str(x) for x in np.transpose([smiles_ids, new_ids]).flatten()]
line = remark_type+" ".join(new_idx_mapping)+"\n"
remark_list.append(line)
line = f.readline()
return remark_list
# sorting atoms in both pdbqt files by ascending x, y, z
from_meeko = parse_pdbqt("Biperiden_meeko.pdbqt")
meeko_sorted = direct_sort(from_meeko)['atom_id']
from_obabel = parse_pdbqt("Biperiden_obabel.pdbqt")
obabel_sorted = direct_sort(from_obabel)['atom_id']
# building lookup table for index mapping
index_lookup = {}
for i, meeko_index in enumerate(meeko_sorted):
index_lookup[meeko_index] = obabel_sorted[i]
# printing new remarks with updated REMARK SMILES IDX and REMARK H PARENT
my_remarks = new_remarks("Biperiden_meeko.pdbqt", index_lookup)
for line in my_remarks:
print(line.replace("\n",""))
The only dependency is numpy. Sample files are attached... Biperiden_meeko.pdbqt.txt Biperiden_obabel.pdbqt.txt
Edits: Parsing coordinates by columns as suggested by diogomart ⌄
@rwxayheee that's cool stuff! Happy to see such advanced usage :-)
Just a comment on parsing x,y,z from PDB(QT) files, it's safer to parse
x = line[30:38]
y = line[38:46]
z = line[46:54]
because there might be no white space in characters 30 through 54, for example, if x
, y
, and z
are all -100
:
-100.000-100.000-100.000
@rwxayheee that's cool stuff! Happy to see such advanced usage :-)
Just a comment on parsing x,y,z from PDB(QT) files, it's safer to parse
x = line[30:38] y = line[38:46] z = line[46:54]
because there might be no white space in characters 30 through 54, for example, if
x
,y
, andz
are all-100
:-100.000-100.000-100.000
Got it! @diogomart, thank you a lot for the kind advice!!
Hi @rwxayheee @diogomart,
Thanks a lot!
Hi,
Thanks for the great tool.
I run docking with Gnina and the resulting pdbqt files actually does not have a SMILES remark.
MODEL 1 REMARK minimizedAffinity -7.60145617 REMARK CNNscore 0.457682222 REMARK CNNaffinity 6.62585068 REMARK 10 active torsions: REMARK status: ('A' for Active; 'I' for Inactive) REMARK 1 A between atoms: O_1 and C_19 REMARK 2 A between atoms: O_2 and C_20 REMARK 3 A between atoms: O_2 and C_30 REMARK 4 A between atoms: O_3 and C_25 REMARK 5 A between atoms: O_6 and C_33 REMARK 6 A between atoms: O_7 and C_38 REMARK 7 A between atoms: O_8 and C_39 REMARK 8 A between atoms: O_9 and C_41 REMARK 9 A between atoms: C_15 and C_25 REMARK 10 A between atoms: C_30 and C_35 ROOT HETATM 1 N UNK N 1 -7.991 3.174 -61.736 0.00 0.00 0.099 N
Is it advisable to use --original_input in this case? And can I ask if a SMILES string can be used for --original_input?
Thanks!