mk_export.py: Using --original_input for gnina pdbqt results

Le-Phung-Hien commented 10 months ago

Hi,

Thanks for the great tool.

I run docking with Gnina and the resulting pdbqt files actually does not have a SMILES remark.

MODEL 1 REMARK minimizedAffinity -7.60145617 REMARK CNNscore 0.457682222 REMARK CNNaffinity 6.62585068 REMARK 10 active torsions: REMARK status: ('A' for Active; 'I' for Inactive) REMARK 1 A between atoms: O_1 and C_19 REMARK 2 A between atoms: O_2 and C_20 REMARK 3 A between atoms: O_2 and C_30 REMARK 4 A between atoms: O_3 and C_25 REMARK 5 A between atoms: O_6 and C_33 REMARK 6 A between atoms: O_7 and C_38 REMARK 7 A between atoms: O_8 and C_39 REMARK 8 A between atoms: O_9 and C_41 REMARK 9 A between atoms: C_15 and C_25 REMARK 10 A between atoms: C_30 and C_35 ROOT HETATM 1 N UNK N 1 -7.991 3.174 -61.736 0.00 0.00 0.099 N

Is it advisable to use --original_input in this case? And can I ask if a SMILES string can be used for --original_input?

Thanks!

diogomart commented 10 months ago

Hello,

The --original_input option is going to be removed, it works only if the ligand was prepared with an earlier version of meeko or explicity requesting openbabel as the chemoinformatics backend. It requires the user to keep track of the original input files and also a remark with an index mapping that is not included in the PDBQT you posted.

Unfortunately, mk_export.py will not work on that PDBQT. Is gnina removing the remarks in the input PDBQT file?

Le-Phung-Hien commented 10 months ago

That what I get after running gnina. Actually I run gnina with the openbabel-generated pdbqt files, which doesnt have SMILES remark also...

REMARK 13 active torsions: REMARK status: ('A' for Active; 'I' for Inactive) REMARK 1 A between atoms: C_2 and C_3 REMARK 2 A between atoms: C_3 and C_4 REMARK 3 A between atoms: C_3 and C_39 REMARK 4 A between atoms: C_4 and C_5 REMARK 5 A between atoms: C_5 and C_6 REMARK 6 A between atoms: C_6 and C_8 REMARK 7 A between atoms: C_17 and O_18 REMARK 8 A between atoms: O_18 and C_19 REMARK 9 A between atoms: C_21 and C_22 REMARK 10 A between atoms: C_22 and O_23 REMARK 11 A between atoms: C_24 and O_25 REMARK 12 A between atoms: C_26 and O_27 REMARK 13 A between atoms: C_28 and O_29 ROOT HETATM 1 C UNK N 1 -2.531 0.189 -2.979 0.00 0.00 -0.002 C

I will try to make ligand with meeko to see if there will be differences.

diogomart commented 10 months ago

Ah, openbabel does not add such remarks to the ligand PDBQT. Only meeko does.

I will try to make ligand with meeko

This should fix it!

rwxayheee commented 10 months ago

Hi @Le-Phung-Hien,

User here, was in a similar situation and had to reindex the pdbqt atoms in a certain way for other types of calculations.. If you are not dealing with extra-large ligands, a direct sorting (ascending x, ascending y, ascending z) on both your input ligand pdbqt and meeko's pdbqt (as a template) could fix this.

The index mapping is only needed once per ligand. Just add the edited REMARK SMILES, REMARK SMILES IDX and REMARK H PARENT to the output pdbqt file. That way mk_export.py might be able to process the docking outputs you already have.

We built some similar index fixing tool before starting to use Meeko (thanks @diogomart again for the great tool!!). May not be the most efficient solution but something like this seemed to generate modified REMARKS that works for mk_export.py:

import numpy as np

class pdbqt_atom:

    def __init__(self, atom_id, atom_coord):
        """
        atom_id: int, as appeared in pdbqt file
        atom_coord: ordered iterable of floats x, y, z
        """
        self.atom_id = int(atom_id)
        self.atom_coord = atom_coord

def parse_pdbqt(pdbqt_filename):

    atom_list = []

    with open(pdbqt_filename, "r") as f:
        line = f.readline()
        while line:
            if line[:6] in ["ATOM  ", "HETATM"]:
                new_atom = pdbqt_atom(line[7:11], [float(line[30:38]), float(line[38:46]), float(line[46:54])])
                atom_list.append(new_atom)
            line = f.readline()

    return atom_list

def direct_sort(input_list):

    dtype = [('atom_id', 'int'), ('coord_x', 'float'), ('coord_y', 'float'), ('coord_z', 'float')]

    structured_list = []

    for atom in input_list:
        structured_list.append((atom.atom_id, atom.atom_coord[0], atom.atom_coord[1], atom.atom_coord[2]))

    structured_array = np.array(structured_list, dtype=dtype)
    sorted_array = np.sort(structured_array, order=['coord_x', 'coord_y', 'coord_z'], kind="mergsort")

    return sorted_array

def new_remarks(meeko_pdbqt_file, lookup_table):

    remark_list = []

    with open(meeko_pdbqt_file, "r") as f:
        line = f.readline()
        while line[:6]=="REMARK":
            for remark_type in ["REMARK SMILES IDX ", "REMARK H PARENT "]:
                if remark_type in line:
                    meeko_idx_mapping = [int(x) for x in line.replace(remark_type,"").split()]
                    n_idx = int(len(meeko_idx_mapping)/2)
                    smiles_ids, meeko_ids = np.reshape(meeko_idx_mapping, (n_idx, 2)).T

                    new_ids = []
                    for old_id in meeko_ids:
                        new_ids.append(lookup_table[old_id])

                    new_idx_mapping = [str(x) for x in np.transpose([smiles_ids, new_ids]).flatten()]
                    line = remark_type+" ".join(new_idx_mapping)+"\n"

            remark_list.append(line)
            line = f.readline()

    return remark_list

# sorting atoms in both pdbqt files by ascending x, y, z
from_meeko = parse_pdbqt("Biperiden_meeko.pdbqt")
meeko_sorted = direct_sort(from_meeko)['atom_id']

from_obabel = parse_pdbqt("Biperiden_obabel.pdbqt")
obabel_sorted = direct_sort(from_obabel)['atom_id']

# building lookup table for index mapping
index_lookup = {}
for i, meeko_index in enumerate(meeko_sorted):
    index_lookup[meeko_index] = obabel_sorted[i]

# printing new remarks with updated REMARK SMILES IDX and REMARK H PARENT
my_remarks = new_remarks("Biperiden_meeko.pdbqt", index_lookup)
for line in my_remarks:
    print(line.replace("\n",""))

The only dependency is numpy. Sample files are attached... Biperiden_meeko.pdbqt.txt Biperiden_obabel.pdbqt.txt

Edits: Parsing coordinates by columns as suggested by diogomart ⌄

diogomart commented 10 months ago

@rwxayheee that's cool stuff! Happy to see such advanced usage :-)

Just a comment on parsing x,y,z from PDB(QT) files, it's safer to parse

x = line[30:38]
y = line[38:46]
z = line[46:54]

because there might be no white space in characters 30 through 54, for example, if x, y, and z are all -100:

-100.000-100.000-100.000

rwxayheee commented 10 months ago

@rwxayheee that's cool stuff! Happy to see such advanced usage :-)

Just a comment on parsing x,y,z from PDB(QT) files, it's safer to parse
x = line[30:38]
y = line[38:46]
z = line[46:54]
because there might be no white space in characters 30 through 54, for example, if x, y, and z are all -100:
-100.000-100.000-100.000

Got it! @diogomart, thank you a lot for the kind advice!!

Le-Phung-Hien commented 10 months ago

Hi @rwxayheee @diogomart,

Thanks a lot!

forlilab / Meeko

mk_export.py: Using --original_input for gnina pdbqt results #66