CliMA / slurm-buildkite

Run buildkite jobs on a slurm cluster
Other
8 stars 1 forks source link

add slurm job id to logs corresponding to sbatch submission to be able to track jobid <-> slurmjob id relationship #17

Closed jakebolewski closed 4 years ago

jakebolewski commented 4 years ago

This is useful for checking if a particular buildkite job id successfully submitted experienced some failure on the slurm controller / squeue side of things for postmortem.

simonbyrne commented 4 years ago

I set it as the jobid agent tag, which is visible under the "Timeline" tab. The problem though is that you won't see it until the slurm job starts.

simonbyrne commented 4 years ago

See here: https://github.com/CliMA/slurm-buildkite/blob/640960edaecff04da5b5591e965a652eaa0a5bcf/bin/slurmjob.sh#L12 We could change it to slurm_jobid to be more descriptive.

jakebolewski commented 4 years ago

that attaches to the agent, this would more be for the poll log output to make it easier to grep (by checking buildid, jobid, slurmjob id as in the deadlock senario the job was never scheduled to a live agent

it's printed to stderr by sbatch, so the info is there is just a little difficult to search for

simonbyrne commented 4 years ago

we could put it in the cron log (since it has access to all 3)

simonbyrne commented 4 years ago

If you use sbatch --parsable it returns just the slurm job id.