Problem with pdb format

forlilab / Meeko

Interfacing RDKit and AutoDock

GNU Lesser General Public License v2.1

182 stars 43 forks source link

Problem with pdb format #1

Closed avnikonenko closed 2 years ago

avnikonenko commented 3 years ago

Dear authors, Thank you for the Meeko package! I use rdkit to parse pdbqt files, but your output pdbqt files cannot be read. I found that it happens because of atoms of C(C10 and the next) starts at column 13 (instead of 14): PDB format in the https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html: "Alignment of one-letter atom name such as C starts at column 14, while two-letter atom name such as FE starts at column 13."
And problem causes "Cl" (rdkit requires "CL"), there is nothing about case in the site but all examples have upper case. Example of the file: CHEMBL482567.pdbqt.txt CHEMBL482567_correct.pdbqt.txt Could you fix it please?

diogomart commented 3 years ago

Hello!

Thanks for reporting this. We'll look into it.

If you used meeko to prepare the PDBQT ligand files from SDF or MOL2, the script mk_copy_coords.py can generate SDF/MOL2 with the docked coordinates using the original SDF/MOL2 as template. The advantage is that the bond information from the original SDF/MOL2 are preserved. OpenBabel is used to calculate the positions of non-polar hydrogens.

diogomart commented 3 years ago

Question: do you use the MolFromPDBFile function?

For me, it chokes with the atom type, because PDBQT files use "A" for aromatic carbons.

$ python
Python 3.6.12 | packaged by conda-forge | (default, Dec  9 2020, 00:36:02) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import rdkit
>>> rdkit.__version__
'2020.09.4'
>>> from rdkit import Chem
>>> mol = Chem.MolFromPDBFile("CHEMBL482567.pdbqt.txt")
[16:39:50] 

****
Post-condition Violation
Element 'A' not found
Violation occurred on line 91 in file /home/conda/feedstock_root/build_artifacts/rdkit_1611244806481/work/Code/GraphMol/PeriodicTable.h
Failed Expression: anum > -1
****

avnikonenko commented 3 years ago

To convert from PDBQT format to PDB columns from 66 symbol should be removed (the charge (Q) and atom type (T) columns ) http://autodock.scripps.edu/faqs-help/faq/is-there-a-way-to-save-a-protein-ligand-complex-as-a-pdb-file-in-autodock. There is my code:

Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] on linux
>> import rdkit
>> rdkit.__version__
'2020.03.3'
>> from rdkit import Chem
>> with open('CHEMBL482567.pdbqt.txt') as f, open ('CHEMBL482567_correct.pdbqt.txt') as f_correct:
...     data, data_correct = f.read(), f_correct.read()
...     
>> mol = Chem.MolFromPDBBlock('\n'.join([i[:66] for i in data.split('\n')]))
[10:38:02] Cannot determine element for PDB atom #13
>> mol_correct = Chem.MolFromPDBBlock('\n'.join([i[:66] for i in data_correct.split('\n')]))
>> mol

>> mol_correct
<rdkit.Chem.rdchem.Mol object at 0x7f429e642f80>

diogomart commented 2 years ago

Thanks for sharing that script. I think you are using an older version of Meeko, possibly from conda or pypi (we haven't created a release in a while). Using the current develop branch, RDKit successfully loaded a Meeko-generated PDBQT for the ligand you posted.

Also the script I mentioned (mk_copy_coords.py) is probably not available in the version you have.

avnikonenko commented 2 years ago

Yes, it works well. Thank you!