forlilab / Meeko

Interfacing RDKit and AutoDock
GNU Lesser General Public License v2.1
169 stars 41 forks source link

--add_index_map flag is not functional #110

Closed Klaborator closed 2 months ago

Klaborator commented 2 months ago

Hello.

The --add_index_map command line flag isn't causing an atom index to be created in the pdbqt.

It appears that the add_index_map is hard coded to False in preparation.py (Class MoleculePreparation) and writer.py (def write_string).

Setting these to True generates the index map in the pdbqt, but it wasn't what I expected (a list of atom pairs): REMARK INDEX MAP 19 1 87 2 14 5 13 6 12 7 18 8 16 9 17 10 3 11 4 12 2 15 1 16 REMARK INDEX MAP 59 17 65 18 60 19 58 23 57 24 131 25 52 28 53 30 123 31 61 35 REMARK INDEX MAP 62 36 63 37 64 40 135 41 5 45 6 46 7 47 8 48 9 49 73 50 10 51 REMARK INDEX MAP 11 52 74 54 75 55 76 56 77 57 15 62 80 63 20 67 21 68 22 69 REMARK INDEX MAP 30 70 100 73 31 75 32 76 33 77 41 78 113 81 42 83 43 84 44 85 REMARK INDEX MAP 50 86 120 89 51 91 54 93 55 94 56 95 45 96 46 97 47 98 48 99 REMARK INDEX MAP 49 102 119 103 34 107 35 108 36 109 37 110 38 111 108 112 REMARK INDEX MAP 39 113 40 114 109 116 110 117 111 118 112 119 23 124 24 125 REMARK INDEX MAP 25 126 26 127 27 128 95 129 28 130 29 131 96 133 97 134 REMARK INDEX MAP 98 135 99 136

I would like to use the atom map to transfer the docked coordinates (in the dlg file) back to the input file. Am I missing something? Is this flag no longer supported? Is there a better way to achieve the coordinate transfer?

Thanks -K

rwxayheee commented 2 months ago

Hi @Klaborator

It appears that the add_index_map is hard coded to False in preparation.py (Class MoleculePreparation) and writer.py (def write_string).

It's not hard-coded to False (False is the default), but may not be correctly passed to the writer function in mk_prepare_ligand.py, in: https://github.com/forlilab/Meeko/blob/0badd5b04b5f749f0941f83ce08cdbc206dcd665/scripts/mk_prepare_ligand.py#L343C56-L343C86 and: https://github.com/forlilab/Meeko/blob/0badd5b04b5f749f0941f83ce08cdbc206dcd665/scripts/mk_prepare_ligand.py#L357 The setup molsetup here doesn't seem to include add_index_map

In Python it works as expected, when we do

PDBQTWriterLegacy.write_string(molsetup, add_index_map=True)

It returns a mapping which shows you how the ordering of the atoms change, from your input to the PDBQT output. For your output

REMARK INDEX MAP 19 1 87 2 14 5 13 6 12 7 18 8 16 9 17 10 3 11 4 12 2 15 1 16

This means what used to be atom ID 19 in your input file is now the 1st atom in the PDBQT output. Original atom number 87 is now the 2nd, and original atom number 14 is now the 5th, ...

I would like to use the atom map to transfer the docked coordinates (in the dlg file) back to the input file.

You can make use of the index mapping, but there might be easier ways! You can use mk_export.py to write docking poses in DLG files to SDF files, and work from there

Klaborator commented 2 months ago

Thanks for explaining the remark formatting. Now that I understand it, I think this will fix the flag: pdbqt_string, success, error_msg = PDBQTWriterLegacy.write_string(molsetup, bad_charge_ok=args.bad_charge_ok, add_index_map=args.add_index_map)

It might be needed in the if is_covalent block, too: [pdbqt_string, success, error_msg = PDBQTWriterLegacy.write_string(molsetup, bad_charge_ok=args.bad_charge_ok)](https://github.com/forlilab/Meeko/blob/0badd5b04b5f749f0941f83ce08cdbc206dcd665/scripts/mk_prepare_ligand.py#L343)

The output looks right... I guess I'll find out.

You can make use of the index mapping, but there might be easier ways! You can use mk_export.py to write docking poses in DLG files to SDF files, and work from there

Yeah, mk_export.py is good. The SDF format is nice, but putting the L/D amino acid residue names/symbols back into the output is not trivial. I need to return the peptide ligands to PDB format with all the residue information. If you have any suggestions, I'm all ears.

Klaborator commented 2 months ago

I sent a pull to add the arg. But I thought it best not to include this writer.py change. It helps my brain process the index map (in case anyone is interested and doesn't mind having a line for each atom):

@staticmethod
def break_long_remark_lines(strings, prefix, max_line_length=79):
    if prefix == "REMARK INDEX MAP":
        remarks = []
        for string in strings: 
            remarks.append(prefix + string)
        #REMARK INDEX MAP <Original-atom-number> <New-atom-number>
        return remarks
    else:
        remarks = [prefix]
        for string in strings:
            if (len(remarks[-1]) + len(string)) < max_line_length:
                remarks[-1] += string
            else:
                remarks.append(prefix + string)
        return remarks
rwxayheee commented 2 months ago

but putting the L/D amino acid residue names/symbols back into the output is not trivial.

There might already be some tools to assign residue names based on structures. But you could use more mapping if you want to 100% preserve the atom names.

By Meeko's INDEX MAP, we got: PDBQT Atom Order ID maps to Original Atom Order ID

Similarly, by setting up index mapping from your original file, you can get: Original Atom Order ID maps to Atom Name, or Residue Name, and Residue ID

Finally, instead of writing a new file (PDBQT or SDF), you could get coordinates from the docking output, and put them back to the peptide structure file you started with

Klaborator commented 2 months ago

Thanks. I think coordinate transfer using the index map is probably the safest.