polling job state using squeue instead of sacct would be preferred (Jeff); it's more accurate and faster to update than sacct, which can take up to ten minutes to update.
Job status will live on in squeue for five minutes after a job exits (Jeff), so catching the completion or erroring of a job shouldn't be a problem.
Also, we should adjust the polling mechanism (Job._system()) to handle other conditions that don't get checked for by SPP like OUT OF MEMORY error. See:
polling job state using squeue instead of sacct would be preferred (Jeff); it's more accurate and faster to update than sacct, which can take up to ten minutes to update.
https://hpc-unibe-ch.github.io/slurm/monitoring-jobs.html
Job status will live on in squeue for five minutes after a job exits (Jeff), so catching the completion or erroring of a job shouldn't be a problem.
Also, we should adjust the polling mechanism (Job._system()) to handle other conditions that don't get checked for by SPP like OUT OF MEMORY error. See:
https://slurm.schedmd.com/squeue.html#SECTION_JOB-STATE-CODES