Mizzou-CBMI / COSMOS2

Python Scientific Pipeline Management System
GNU General Public License v3.0
71 stars 39 forks source link

defend against Error 26 condition that kills jobs before they are scheduled [do not merge] #63

Closed mdpearson closed 7 years ago

mdpearson commented 7 years ago
  1. Add a new Task.scheduling_error field.
  2. Set this field if qacct shows the errors we've been seeing this past week.
  3. If/when _process_finished_tasks() sees this flag, a. immediately log a warning error, b. wait 5–10 sec, c. clear the flag, then d. set the Task's status to no_attempt so it gets retried.
  4. Use "suspicious" instead of "corrupt" to describe unexpected qacct output.
mdpearson commented 7 years ago

Erik I'd like to test this some more before you merge but as always appreciate your feedback and advice. thanks