On E2 (BCH internal SLURM), there is a partition available for preemptible jobs. pman should have support for interacting with preemptible schedulers, and retrying interrupted jobs.
The same logic could also be applied to Kubernetes, where Kubernetes would want to reschedule a pod under certain circumstances (node down, OOMKilled).
On E2 (BCH internal SLURM), there is a partition available for preemptible jobs.
pman
should have support for interacting with preemptible schedulers, and retrying interrupted jobs.The same logic could also be applied to Kubernetes, where Kubernetes would want to reschedule a pod under certain circumstances (node down, OOMKilled).