Re-start failing SLURM jobs

celerity / slurmactiond

Schedule GitHub Actions jobs on a cluster through SLURM

MIT License

7 stars 0 forks source link

Re-start failing SLURM jobs #7

Closed fknorr closed 1 year ago

fknorr commented 2 years ago

Related to #6 : If a SLURM job dies, e.g. due to job pickup timeout, we should restart it if the corresponding job is still queued in Github.

fknorr commented 2 years ago

Partially addressed now by the runner restarting GitHub API calls and config.sh / run.sh invocations. The server process should still be informed about irrecoverable errors, and scancel of a job should issue a restart.