Open zhou-pj opened 4 years ago
I'd like to contribute to this once a decision has been made on how to proceed.
@lyrivera I don't think anything should stop you and @zhou-pj from going ahead with this. Please feel free to either propose a more detailed plan on how to achieve this that can be discussed on this issue or provide a draft implementation directly.
@csadorf Thanks, we will discuss this further and come up with a plan.
Feature description
This suggestion was mentioned in a discussion with @lyrivera regarding a recent incident of a /scratch file system breakdown where some files in my workspace is not accessible. When I do
python project.py status
orpython project.py submit
, they will all fail if any job files is not accessible. It would be great if we have something similar to theproject.check()
in signac incorporated here so that status check can continue and label those CORRUPTED, and the submit process can also pick the unaffected ones and continue to work.Additional context
The related /scratch system incident that sparked this need: https://portal.tacc.utexas.edu/user-news/-/news/103216