I believe if there was a failure for whatever reason (job error code not one of the above), the job id may still be assigned in the queue but with a job_state that means that particular job should be potentially re-submitted by the cron job.
would return all job states (canceled, failure should potentially be re-submitted in certain circumstances (ex. invalid wait state?) then the build kite service should handle the state transition properly for us, given on the build-runner error codes (command failure or build runner failure)
I believe if there was a failure for whatever reason (job error code not one of the above), the job id may still be assigned in the queue but with a job_state that means that particular job should be potentially re-submitted by the cron job.
ex: https://github.com/CliMA/slurm-buildkite/blob/master/bin/poll.py#L15
would return all job states (canceled, failure should potentially be re-submitted in certain circumstances (ex. invalid wait state?) then the build kite service should handle the state transition properly for us, given on the build-runner error codes (command failure or build runner failure)
might be the easier workaround to https://github.com/CliMA/slurm-buildkite/issues/3