CliMA / slurm-buildkite

Run buildkite jobs on a slurm cluster
Other
7 stars 0 forks source link

filter squeue id's by pending / running / completed states #5

Closed jakebolewski closed 4 years ago

jakebolewski commented 4 years ago

I believe if there was a failure for whatever reason (job error code not one of the above), the job id may still be assigned in the queue but with a job_state that means that particular job should be potentially re-submitted by the cron job.

ex: https://github.com/CliMA/slurm-buildkite/blob/master/bin/poll.py#L15

would return all job states (canceled, failure should potentially be re-submitted in certain circumstances (ex. invalid wait state?) then the build kite service should handle the state transition properly for us, given on the build-runner error codes (command failure or build runner failure)

might be the easier workaround to https://github.com/CliMA/slurm-buildkite/issues/3

simonbyrne commented 4 years ago

Is this now done by https://github.com/CliMA/slurm-buildkite/blob/fb7cb102a0746182b912c279f2853e66aaad69ae/bin/poll.py#L31?

jakebolewski commented 4 years ago

this is on the slurm side, the two can diverge