facebookincubator / submitit

Python 3.8+ toolbox for submitting jobs to Slurm
MIT License
1.3k stars 125 forks source link

Requeueing on timeouts when launching jobs with CommandFunction #1732

Open Niccolo-Ajroldi opened 1 year ago

Niccolo-Ajroldi commented 1 year ago

Is it possible to submit a job to slurm with submitit.helpers.CommandFunction and submitit.AutoExecutor, in such a way that it is requeued on timeouts?

As mentioned in the docs, a Python function is requeued on timeouts only if it implements a checkpoint method. Otherwise by default submitit does not requeue on timeouts, only on preemptions. However, I would like to submit a job through CommandFunction, and I need it to be requeued on timeout.

Thanks in advance!