biocore / mg-scripts

Knight Lab internal Metagenomic processing scripts for demultiplexing, QC and host removal
BSD 3-Clause "New" or "Revised" License
1 stars 5 forks source link

Update job polling mechanism #134

Open charles-cowart opened 6 months ago

charles-cowart commented 6 months ago

polling job state using squeue instead of sacct would be preferred (Jeff); it's more accurate and faster to update than sacct, which can take up to ten minutes to update.

https://hpc-unibe-ch.github.io/slurm/monitoring-jobs.html

Job status will live on in squeue for five minutes after a job exits (Jeff), so catching the completion or erroring of a job shouldn't be a problem.

Also, we should adjust the polling mechanism (Job._system()) to handle other conditions that don't get checked for by SPP like OUT OF MEMORY error. See:

https://slurm.schedmd.com/squeue.html#SECTION_JOB-STATE-CODES