Reading a multi-frame PDB file created by VMD

maxscheurer commented 7 years ago

Expected behaviour

Reading a multi-frame PDB file created by VMD into mda.Universe

Actual behaviour

Crash on PBD.py

_read_frame

because the VMD multi-frame PDB files do not contain MODEL/ENDMDL keywords. Instead, VMD prints END after every frame to the PDB file. Is there a way to make MDAnalysis robust against that?

Code to reproduce the behaviour

import MDAnalysis as mda

u = mda.Universe(psf, multiframe_pdb_from_VMD)

....

Currently version of MDAnalysis:

0.15.0

orbeckst commented 7 years ago

I generated an example file with VMD 1.9.3, using our MDAnalysis.tests.datafiles.PDB_multiframe as an input for VMD. The file is attached (zip compressed so that GitHub accepts it): nmr_neopetrosiamide_vmd193.pdb.zip

import MDAnalysis as mda
mda.Universe("./nmr_neopetrosiamide_vmd193.pdb")

fails with

IndexError                                Traceback (most recent call last)
<ipython-input-13-45b5e37c746d> in <module>()
----> 1 mda.Universe("./nmr_neopetrosiamide_vmd193.pdb")

/Users/oliver/.virtualenvs/mda_clean/lib/python2.7/site-packages/MDAnalysis/core/universe.pyc in __init__(self, *args, **kwargs)
    268             else:
    269                 coordinatefile = args[1:]
--> 270             self.load_new(coordinatefile, **kwargs)
    271
    272         # Check for guess_bonds

/Users/oliver/.virtualenvs/mda_clean/lib/python2.7/site-packages/MDAnalysis/core/universe.pyc in load_new(self, filename, format, in_memory, **kwargs)
    416         kwargs['n_atoms'] = self.atoms.n_atoms
    417
--> 418         self.trajectory = reader(filename, **kwargs)
    419         if self.trajectory.n_atoms != len(self.atoms):
    420             raise ValueError("The topology and {form} trajectory files don't"

/Users/oliver/.virtualenvs/mda_clean/lib/python2.7/site-packages/MDAnalysis/coordinates/PDB.pyc in __init__(self, filename, **kwargs)
    311         self.n_frames = len(offsets)
    312
--> 313         self._read_frame(0)
    314
    315     def Writer(self, filename, **kwargs):

/Users/oliver/.virtualenvs/mda_clean/lib/python2.7/site-packages/MDAnalysis/coordinates/PDB.pyc in _read_frame(self, frame)
    364                 self.ts._pos[pos] = [line[30:38],
    365                                      line[38:46],
--> 366                                      line[46:54]]
    367                 # TODO import bfactors - might these change?
    368                 try:

IndexError: index 392 is out of bounds for axis 0 with size 392

I don't know what the best way is to support the VMD PDB trajectory format. Some ideas

require an explicit format specifier for VMD multiframe pdbs and write a separate reader (kludgy and not very D.R.Y.)
check for multiple END tags in the file before reading (or when encountering an index error for a multiframe PDB – we already do this in the loop starting at coordinates/PDB.py: L271
... ?

kain88-de commented 7 years ago

@maxscheurer as a workaround. We can read the xyz format of vmd.

richardjgowers commented 7 years ago

I've not got time to try this, but I think it might work if you added if line.startswith('END'): models.append(pdbfile.tell()) into the code @orbeckst linked to above. This would add END as a trigger for a new frame... But you probably need to check that the file hasn't ended at that point, as that's what END is meant to indicate. So maybe creating a list of the offsets at END points, and then removing the last one. But then if you've got two newframe triggers (MODEL and END) you need to make sure you don't count frames twice if the PDB file has both..

stgzr commented 1 year ago

Same issue needs help (MDAnaylsis == 2.3.0)

orbeckst commented 1 year ago

Use chemfiles for the trajectory part:

u = mda.Universe("nmr_neopetrosiamide_vmd193.pdb", format="CHEMFILES", topology_format="PDB")

but use MDAnalysis for the PDB part: The MDA PDBParser is tricked into only reading the first frame by the END record and gets all the information it needs. CHEMFILES can deal with VMD-generated PDB "trajectories" just fine and altogether, you should get a working universe.

orbeckst commented 1 year ago

Note: If you don't have chemfiles installed, you must install chemfiles > 0.10 at time of writing:

mamba install 'chemfiles>0.10'

Thanks to @Luthaf for the comment in #2731 that chemfiles can read VMD PDB files.

tcolburn commented 1 year ago

@orbeckst This worked like a charm for me! Thanks for the update

MDAnalysis / mdanalysis