Is it possible to submit a job to slurm with submitit.helpers.CommandFunction and submitit.AutoExecutor, in such a way that it is requeued on timeouts?
As mentioned in the docs, a Python function is requeued on timeouts only if it implements a checkpoint method. Otherwise by default submitit does not requeue on timeouts, only on preemptions. However, I would like to submit a job through CommandFunction, and I need it to be requeued on timeout.
Is it possible to submit a job to slurm with
submitit.helpers.CommandFunction
andsubmitit.AutoExecutor
, in such a way that it is requeued on timeouts?As mentioned in the docs, a Python function is requeued on timeouts only if it implements a checkpoint method. Otherwise by default
submitit
does not requeue on timeouts, only on preemptions. However, I would like to submit a job throughCommandFunction
, and I need it to be requeued on timeout.Thanks in advance!