glotzerlab / signac-flow

Workflow management for signac-managed data spaces.
https://signac.io/
BSD 3-Clause "New" or "Revised" License
48 stars 37 forks source link

adding potential 'Corrupted' state for job in status check and allow submit for unaffected jobs #261

Open zhou-pj opened 4 years ago

zhou-pj commented 4 years ago

Feature description

This suggestion was mentioned in a discussion with @lyrivera regarding a recent incident of a /scratch file system breakdown where some files in my workspace is not accessible. When I do python project.py status or python project.py submit, they will all fail if any job files is not accessible. It would be great if we have something similar to the project.check() in signac incorporated here so that status check can continue and label those CORRUPTED, and the submit process can also pick the unaffected ones and continue to work.

Additional context

The related /scratch system incident that sparked this need: https://portal.tacc.utexas.edu/user-news/-/news/103216

lyrivera commented 4 years ago

I'd like to contribute to this once a decision has been made on how to proceed.

csadorf commented 4 years ago

@lyrivera I don't think anything should stop you and @zhou-pj from going ahead with this. Please feel free to either propose a more detailed plan on how to achieve this that can be discussed on this issue or provide a draft implementation directly.

lyrivera commented 4 years ago

@csadorf Thanks, we will discuss this further and come up with a plan.