Open sonyahanson opened 8 years ago
Thanks for pointing these out! Will try to see what's causing these...
For the bincount
error, I guess that means that some entries in dtrajs
are -1
? Not sure where that would happen.
For the frames_from_files
error, my best guess is that source
and source_full
are seeing different length trajectories, since the munging pipeline is running at the same time in that directory. How do you think we should tackle this? Maybe just moving the definition of source_full
from line 128 to be immediately after the definition of source
on line 98 would solve that issue...
Maybe just moving the definition of source_full from line 128 to be immediately after the definition of source on line 98 would solve that issue...
Worth a try, but maybe post an issue on the pyemma
issue tracker? It would be useful for the pyemma
tools to be robust to the number of frames increasing in such a situation as the calculation proceeds.
As a temporary workaround, you could probably rsync
the (no-solvent
) h5
and pdb
files to a staging trajectory, like /cbio/jclab/projects/fah/fah-msm/staging
immediately before analyzing them. Each project is generally << 1 TB.
Hmm... Indeed when I come across this error again, and I look at dtrajs.npy
:
for sublist in dtrajs:
if all(i >=0 for i in sublist) == False:
print sublist
....:
[149 415 415 ..., 149 149 -1]
I think that's good evidence this error is because the trajectory files are being written to while the script is running
Are you waiting for me to report this to the pyemma issue tracker, or can one of you do that?
we are not waiting for you
I'll report it now!
Not quite sure how these are all related, but so far running
pipeline.py
on three different datasets gets three different errors (the first is in #12).I did not get any of these problems running on a small number of trajectories (
python pipeline.py '/cbio/jclab/projects/fah/fah-data/munged3/no-solvent/11400/run0-clone*0.h5'
).The new errors are (for 11400):
(for 11406):