choderalab / fahmunge

Tools for Munging Folding@Home datasets
MIT License
4 stars 6 forks source link

Pipeline stalling on certain trajectories #43

Open rafwiewiora opened 7 years ago

rafwiewiora commented 7 years ago

So far I have seen stalling of the pipeline on two projects:

10490 - last output lines:

Stripping /data/choderalab/fah/munged3/all-atoms/10490/run28-clone0.h5
all-atom trajectory /data/choderalab/fah/munged3/all-atoms/10490/run28-clone0.h5 has 27400 frames
Found 685,675 filenames and 27400,27000 frames in /data/choderalab/fah/munged3/all-atoms/10490/run28-clone0.h5 and /data/choderalab/fah/munged3/no-solvent/10490/run28-clone0.h5, respectively.

This was resolved by removing 10490 from projects.csv (now running projects.csv.no10490) - apparently it's ok to stay this way as the project is inactive.

10494 - last output lines:

Stripping /data/choderalab/fah/munged3/all-atoms/10494/run11-clone35.h5
all-atom trajectory /data/choderalab/fah/munged3/all-atoms/10494/run11-clone35.h5 has 3360 frames
Found 168,148 filenames and 3360,2960 frames in /data/choderalab/fah/munged3/all-atoms/10494/run11-clone35.h5 and /data/choderalab/fah/munged3/no-solvent/10494/run11-clone35.h5, respectively.
Stripping /data/choderalab/fah/munged3/all-atoms/10494/run11-clone36.h5
all-atom trajectory /data/choderalab/fah/munged3/all-atoms/10494/run11-clone36.h5 has 3320 frames
Found 166,148 filenames and 3320,2960 frames in /data/choderalab/fah/munged3/all-atoms/10494/run11-clone36.h5 and /data/choderalab/fah/munged3/no-solvent/10494/run11-clone36.h5, respectively.

This was resolved by removing /data/choderalab/fah/munged3/all-atoms/10494/run11-clone36.h5 and data/choderalab/fah/munged3/no-solvent/10494/run11-clone36.h5 and letting the pipeline regenerate those.

"We should add a watchdog timer that kills one of these when it gets stuck." as @jchodera suggested.