Closed rkdarst closed 2 years ago
Hi @rkdarst . Thank you for this contribution. I just pushed a change to update the Slurm version to 20.02. I also addressed some issues with starting up the daemons correctly via supervisorctl.
Give it a try and let me know if it works better for you.
This makes the Slurm accounting database work (we had to run the command sacctmgr add cluster $CLUSTERNAME). According to the Slurm docs, this is only needed in Slurm < 20.02, so maybe it's better to close this and upgrade Slurm instead.
However, if you do upgrade to slurm >=20.02, note that we found several race conditions where the daemons didn't start up on time / in the right order (mysql was apparently not fully killed, so it took another few seconds for supervisord to bring everything into the RUNNING state). This may be less of a concern with when the cluster gets automatically added, but from what I just discovered there might still be some issues.
sacctmgr add cluster linux
to be run in order to add the cluster to the accounting database.