forlilab / Meeko

Interfacing RDKit and AutoDock
GNU Lesser General Public License v2.1
192 stars 48 forks source link

calc_box function fails to handle some PDBQT formatting #137

Closed rkosai closed 3 months ago

rkosai commented 3 months ago

I have a PDBQT file generated by ADFR prepare_receptor. This PDBQT file is mostly formatted consistently, but appears in some cases to generate lines that are not properly lined up. For example, these lines.

ATOM   2625  CE ALYS A 272      18.616   6.818  36.784  0.62 36.28     0.229 C
ATOM   2626  NZ ALYS A 272      17.272   6.342  37.173  0.62 32.93    -0.079 N
ATOM   2627  HZ A1LYS A 272      17.283   5.375  37.499  1.00  0.00     0.274 HD
ATOM   2628  HZ A2LYS A 272      16.603   6.467  36.413  1.00  0.00     0.274 HD
ATOM   2629  HZ A3LYS A 272      16.849   6.959  37.866  1.00  0.00     0.274 HD
ATOM   2630  N   GLY A 273      23.117  11.708  34.554  1.00 25.80    -0.351 N

This breaks calc_box in gridbox.py because it's using positional offsets.

            if line.startswith('ATOM') or line.startswith('HETATM'):
                x = float(line[30:38])
                y = float(line[38:46])
                z = float(line[46:54])

Is this considered a bug with Meeko or with prepare_receptor? If the former, I can look at a patch.

rwxayheee commented 3 months ago

Hi @rkosai PDBQT is expected to be parsed by column numbers. The code in Meeko should work if the formatting is correct. Could you post your input PDB file? The mismatching is likely a result of having multiple conformations of this questioned LYS. The expected format is:

ATOM   2627  HZ1 ALYS A 272      17.283   5.375  37.499  1.00  0.00     0.274 HD
ATOM   2628  HZ2 ALYS A 272      16.603   6.467  36.413  1.00  0.00     0.274 HD
ATOM   2629  HZ3 ALYS A 272      16.849   6.959  37.866  1.00  0.00     0.274 HD
rkosai commented 3 months ago

Hi @rxwayheee

Here is the file. It was generated by prepare_receptor from the AFDR suite, using the -A hydrogens command line flag.

receptor.pdbqt.txt

rwxayheee commented 3 months ago

Hi @rkosai Can you post the input (PDB) file you used to generate this PDBQT?

It might be something with the -A hydrogens flag. We can't provide help with fixing prepare_receptor, but there might be a walkaround. if you wanted to add hydrogens, you could try reduce which is a third party program and there's a version included in ADFRsuite

rkosai commented 3 months ago

Hi @rwxayheee,

The file is here: receptor.pdb.txt

It works fine without the -A hydrogens flag. My assertion is that the calc_box function should robustly handle valid PDBQT files, which I believe this is. I can write a function on my end that handles them, but I figured I'd bring it up in case you wanted to patch it.

If you don't see this as an issue that should be supported, no problem at all.

rwxayheee commented 3 months ago

Hi @rkosai The PDBQT in the first post isn't in a valid format (please see here for the expected layout). For two reasons: (1) PDBQT is supposed to be parsed by columns, as in gridbox.py. There are pros and cons of doing this or the other way (parse by items), but I'm under the impression that by design it should be formatted to columns, like PDB. (2) The atom names are wrong for the added hydrogens, due to issues with prepare_receptor -A hydrogens. Your starting pdb file is fine. Thanks for bringing this to our attention.

rkosai commented 3 months ago

Okay, great. Feel free to close.

rwxayheee commented 3 months ago

Thanks again for reporting this, @rkosai. Please feel free to re-open if you have comments, thoughts or any further issues related to this.