Closed sonyahanson closed 7 years ago
Can you report this to the pyemma
issue tracker? The problematic file is only 10MB, so it could be easily shared as a test case.
/cbio/jclab/home/hansons/sims/AZ/SYK/11407/run7-clone19.h5
There's nothing wrong with the HDF5 file that I can tell from examining the header:
h5dump -H /cbio/jclab/home/hansons/sims/AZ/SYK/11407/run7-clone19.h5
As suggested, I made an issue in pyemma for this: https://github.com/markovmodel/PyEMMA/issues/958
As mentioned in that issue, I tried running msm-pipeline on just that trajectory, and it seems to run fine.
I've so far seen this issue for both staged SYK and SRC trajectories. I've even tried rerunning after moving that particular errored trajectory into a different subfolder, and so far no matter how many trajectories I move, the error still comes up, referencing a new trajectory.
This error happens while we are outputing pdb's for macrostates right? Could it be that somehow we aren't lining up the trajectories with how we are pulling out the pdb's here?
As suggested, I made an issue in pyemma for this: markovmodel/PyEMMA#958
Thanks!
This error happens while we are outputing pdb's for macrostates right? Could it be that somehow we aren't lining up the trajectories with how we are pulling out the pdb's here?
Maybe this is a question for @maxentile?
The only other thing I can think of trying is to try the github version of pyemma, since there have been a few bugfixes applied since the release of 2.2.6.
Hmm, I don't yet have a guess for where this would happen.
@sonyahanson : Do you have an idea for where this might happen?
We define a source
here, then project all frames from the source
ultimately onto discrete trajectories in dtrajs
, and then we call save_trajs
using that source object here.
So, at some point, either (1) the trajectory files have changed length while the script is running (definitely shouldn't happen), or (2) dtrajs
somehow end up with different lengths than their corresponding input trajectories (unclear why this would happen).
I think we can add a check for (1), e.g.:
initial_traj_lengths = source.trajectory_lengths()
...
if not np.array_equal(source.trajectory_lengths(), initial_traj_lengths):
print('Something went wrong: the lengths of the trajectory files have changed!')
We can add a check for (2) after line 102, e.g.:
if not np.array_equal(source.trajectory_lengths(), [len(dtraj) for dtraj in dtrajs]):
print('Something went wrong: the lengths of the trajectory files have become misaligned from the lengths of their corresponding discrete trajectories!')
This has been solved. Updating pyemma has eliminated these issues.
Getting ValueError even with staged trajectories.
Josh and I have also seen this for non-staged trajectories (
/cbio/jclab/projects/fah/fah-data/munged3/no-solvent/11401/
), but thought it would not happen with staged trajectories.Also note pyemma version:
Going to run this on other staged trajectories to see if it's maybe just a problem with the copying process?