giovtorres / docker-centos7-slurm

Slurm Docker Container on CentOS 7
MIT License
85 stars 56 forks source link

docker-entrypoint: create cluster before starting #22

Closed rkdarst closed 2 years ago

rkdarst commented 3 years ago

This makes the Slurm accounting database work (we had to run the command sacctmgr add cluster $CLUSTERNAME). According to the Slurm docs, this is only needed in Slurm < 20.02, so maybe it's better to close this and upgrade Slurm instead.

However, if you do upgrade to slurm >=20.02, note that we found several race conditions where the daemons didn't start up on time / in the right order (mysql was apparently not fully killed, so it took another few seconds for supervisord to bring everything into the RUNNING state). This may be less of a concern with when the cluster gets automatically added, but from what I just discovered there might still be some issues.

giovtorres commented 2 years ago

Hi @rkdarst . Thank you for this contribution. I just pushed a change to update the Slurm version to 20.02. I also addressed some issues with starting up the daemons correctly via supervisorctl.

Give it a try and let me know if it works better for you.