getzlab / canine

A modular, high-performance computing solution to run jobs using SLURM
https://getzlab.github.io/canine/
BSD 3-Clause "New" or "Revised" License
6 stars 6 forks source link

Suspend job that's been preempted too often #136

Closed julianhess closed 1 year ago

julianhess commented 1 year ago

Note that Slurm does not distinguish between preemptions and jobs that were manually requeued (due to failed exit state). Preemption limit must be higher than retry limit, since excess manual requeues would cause job to be resubmitted to nonpreemptible partition, which resets retry count.