Closed jchodera closed 8 years ago
How do you check for a truncated trajectory? I can combine this with #26.
So in my case, it was not just a case of 'truncated trajectories' I think the file was corrupted somehow. Below is the error I get. To check for this, certainly you could try to do md.load
and see if this error pops up, but that might be expensive in terms of memory...
If you want a bad file to check this on, feel free to use /cbio/jclab/projects/fah/fah-data/munged-with-time/no-solvent/11407/run3-clone20.h5
.
Traceback (most recent call last):
File "DFG_dihedral_CK2_SYK.py", line 40, in <module>
[SYK_dihedral] = DFG_dihedral(SYK_trajectories, SYK_DFG)
File "DFG_dihedral_CK2_SYK.py", line 31, in DFG_dihedral
for traj in trajectories:
File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/msmbuilder-3.2.0-py2.7-linux-x86_64.egg/msmbuilder/dataset.py", line 203, in __iter__
yield self.get(key)
File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/msmbuilder-3.2.0-py2.7-linux-x86_64.egg/msmbuilder/dataset.py", line 414, in get
atom_indices=self.atom_indices)
File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/mdtraj/core/trajectory.py", line 420, in load
value = loader(filename, **kwargs)
File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/mdtraj/formats/hdf5.py", line 113, in load_hdf5
with HDF5TrajectoryFile(filename) as f:
File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/mdtraj/formats/hdf5.py", line 189, in __init__
self._handle = self._open_file(filename, mode=mode, filters=compression)
File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/tables/file.py", line 318, in open_file
return File(filename, mode, title, root_uep, filters, **kwargs)
File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/tables/file.py", line 784, in __init__
self._g_new(filename, mode, **params)
File "tables/hdf5extension.pyx", line 488, in tables.hdf5extension.File._g_new (tables/hdf5extension.c:5458)
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "H5F.c", line 604, in H5Fopen
unable to open file
File "H5Fint.c", line 1085, in H5F_open
unable to read superblock
File "H5Fsuper.c", line 277, in H5F_super_read
file signature not found
End of HDF5 error back trace
Unable to open/create file '/cbio/jclab/projects/fah/fah-data/munged-with-time/no-solvent/11407/run3-clone20.h5'
Looks like you can just use md.open(filename)
.
Open that and some good file in a text editor - it's missing the HDF header. In addition to being super short of course - perhaps when the file is being prepared the header gets appended last and hence it's writing out h5's missing headers if you interrupt it?
Which might make sense considering the header has info about the whole file I think.
Anyway, I'm just going to check every some iterations of a project that all files don't throw an exception on md.open
and delete the ones that do, which will have the pipeline then regenerate them.
Anyway, I'm just going to check every some iterations of a project that all files don't throw an exception on md.open and delete the ones that do, which will have the pipeline then regenerate them.
I think the code already tries to open every HDF5 file each iteration:
One of two things is happening:
trj_file = HDF5TrajectoryFile(output_filename, mode='a')
does not throw an exception for corrupted files. If that is the case, we should add an explicit md.open(output_filename)
statement beforehand to see if the file is corrupted, handle that, and if no corruption, close the file before proceeding.This is tackled in #27.
It looks like some munged trajectories are getting truncated, potentially due to imperfect protection when the munging script is killed. We should periodically check all trajectories for integrity and regenerate the ones that fail the check, which can probably be done just by deleting the munged trajectory so it is automatically regenerated.