Correct Chain Seperation in PDB

kain88-de commented 8 years ago

When we write a PDB file with multiple frames it is not possible to read the frames correctly using the BioPython PDBParser. We use ENDMDL to finish a frame but I have more commonly seen that others use a TER to end a frame.

I haven't checked yet what gromacs does when I convert a trajectory to PDB or what the BioPython writer produces.

To see what is wrong

>>> pdb = mda.coordinates.PDB.PDBReader('test.pdb')
>>> pdb.n_atoms
25

The number of atoms should be 5.

test.pdb

HEADER    
TITLE     MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1   80.000   80.000   80.000  60.00  60.00  90.00 P 1           1
MODEL         1
ATOM      1  CA  MET     1       0.000   1.000   2.000  1.00 84.71           C
ATOM      2  CA  ARG     2       3.000   4.000   5.000  1.00 68.59           C
ATOM      3  CA  ILE     3       6.000   7.000   8.000  1.00 48.30           C
ATOM      4  CA  LYS     4       9.000  10.000  11.000  1.00 49.38           C
ATOM      5  CA  LEU     5      12.000  13.000  14.000  1.00 42.10           C
ENDMDL
HEADER    
TITLE     MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1   80.000   80.000   80.000  60.00  60.00  90.00 P 1           1
MODEL         2
ATOM      1  CA  MET     1       0.000   2.000   4.000  1.00 84.71           C
ATOM      2  CA  ARG     2       6.000   8.000  10.000  1.00 68.59           C
ATOM      3  CA  ILE     3      12.000  14.000  16.000  1.00 48.30           C
ATOM      4  CA  LYS     4      18.000  20.000  22.000  1.00 49.38           C
ATOM      5  CA  LEU     5      24.000  26.000  28.000  1.00 42.10           C
ENDMDL
HEADER    
TITLE     MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1   80.000   80.000   80.000  60.00  60.00  90.00 P 1           1
MODEL         3
ATOM      1  CA  MET     1       0.000   4.000   8.000  1.00 84.71           C
ATOM      2  CA  ARG     2      12.000  16.000  20.000  1.00 68.59           C
ATOM      3  CA  ILE     3      24.000  28.000  32.000  1.00 48.30           C
ATOM      4  CA  LYS     4      36.000  40.000  44.000  1.00 49.38           C
ATOM      5  CA  LEU     5      48.000  52.000  56.000  1.00 42.10           C
ENDMDL
HEADER    
TITLE     MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1   80.000   80.000   80.000  60.00  60.00  90.00 P 1           1
MODEL         4
ATOM      1  CA  MET     1       0.000   8.000  16.000  1.00 84.71           C
ATOM      2  CA  ARG     2      24.000  32.000  40.000  1.00 68.59           C
ATOM      3  CA  ILE     3      48.000  56.000  64.000  1.00 48.30           C
ATOM      4  CA  LYS     4      72.000  80.000  88.000  1.00 49.38           C
ATOM      5  CA  LEU     5      96.000 104.000 112.000  1.00 42.10           C
ENDMDL
HEADER    
TITLE     MDANALYSIS FRAMES FROM 0, SKIP 1: Created by PrimitivePDBWriter
CRYST1   80.000   80.000   80.000  60.00  60.00  90.00 P 1           1
MODEL         5
ATOM      1  CA  MET     1       0.000  16.000  32.000  1.00 84.71           C
ATOM      2  CA  ARG     2      48.000  64.000  80.000  1.00 68.59           C
ATOM      3  CA  ILE     3      96.000 112.000 128.000  1.00 48.30           C
ATOM      4  CA  LYS     4     144.000 160.000 176.000  1.00 49.38           C
ATOM      5  CA  LEU     5     192.000 208.000 224.000  1.00 42.10           C
ENDMDL
END

jbarnoud commented 8 years ago

From what I see (http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ENDMDL, http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#TER), the best is probably to have both.

orbeckst commented 8 years ago

When we implemented it we went with what the PDB standard said for multi frame files (i.e. NMR), which means MODEL records..

By the way, I am pretty sure that your PDB example is illegal format because I read HEADER and TITLE that there can only be one of these records and not have TITLE and HEADER records in each model; this is said explicitly under record types where HEADER is listed as a record of which only a single line may occur.

orbeckst commented 8 years ago

... so if the PDB "trajectory" was produced with MDAnalysis then that's a bug.

kain88-de commented 8 years ago

You guessed right. This is a PDB written by our PrimitivePDBWriter. I'll check what exactly happens.

orbeckst commented 8 years ago

I think you can simply change the issue title and label it as defect.

Oliver Beckstein email: orbeckst@gmail.com

Am Feb 24, 2016 um 0:10 schrieb kain88-de notifications@github.com:

You guessed right. This is a PDB written by our PrimitivePDBWriter. I'll check what exactly happens.

— Reply to this email directly or view it on GitHub.

kain88-de commented 8 years ago

No please open another issue. The header thing has nothing to do with the frame separation.

kain88-de commented 8 years ago

This is a good bug if you want to get started contributing MDAnalysis. The Problem is that we only write a ENDMDL record to distinguish between frames while we should write a ENDMDL and TER record.

This can be fixed in MDAnalysis/coordinates/PDB.py:PrimitivePDBWriter

backpropper commented 8 years ago

I am working on this bug. What I understand is that the TER record should be written (before or after every ENDML, doesn't matter, right?)

jbarnoud commented 8 years ago

It does matter that the TER is before the ENDMDL. The TER closes a chain and is part of that chain. ENDMDL closes a model, that contains one or several chains, everything that is part of the model should be between the MODEL and the ENDMDL records.

backpropper commented 8 years ago

Since the TER issue is now fixed, is the bug solved?

kain88-de commented 8 years ago

Renamed the bug because it is about the TER record to separate chains

dim-99 commented 1 month ago

Is this issue still relevant? I would like to try.

MDAnalysis / mdanalysis

Correct Chain Seperation in PDB #734