giovtorres / docker-centos7-slurm

Slurm Docker Container on CentOS 7
MIT License
87 stars 56 forks source link

slurmctld not running #2

Closed BerndDoser closed 6 years ago

BerndDoser commented 6 years ago

Hi,

Using the command

docker run -it -h ernie giovtorres/docker-centos7-slurm:17.02.9

the program 'slurmctld' is not running:

[root@ernie /]# supervisorctl status
munged                           RUNNING   pid 263, uptime 0:00:17
mysqld                           RUNNING   pid 491, uptime 0:00:13
slurmctld                        EXITED    Nov 13 12:50 PM
slurmd                           RUNNING   pid 262, uptime 0:00:17
slurmdbd                         RUNNING   pid 266, uptime 0:00:17

I am using docker 17.05.0-ce @ ubuntu 16.04.

Best regards, Bernd

giovtorres commented 6 years ago

Hi Bernd,

It appears the database starts a little slower with ubuntu 16.04 and 17.06.2-ce (I couldn't find the deb package for 17.05.0-ce, but I was still able to replicate the problem). slurmctld will fail to start if the database is not ready. If you restart slurmctld (supervisorctl restart slurmctld), it should restart successfully.

I made some small adjustments in f34eb3c. In a few minutes, you should be able to pull docker.io/giovtorres/docker-centos7-slurm:latest, which will be slurm 17.02.9. Let me know if you have the same issue.

Thanks, Giovanni

BerndDoser commented 6 years ago

Thank you for your answer. The restart of slurmctld is working fine. But 'docker.io/giovtorres/docker-centos7-slurm:latest' containing slurm 17.11.0-0rc3 still fails to start slurmctld at boot.

Best regards, Bernd

giovtorres commented 6 years ago

Did you run docker pull again? The image corresponding to the latest tag changed with this fix. You'll have to pull down the updated image.

BerndDoser commented 6 years ago

Yes, I had pulled the latest tag again. It contains your changes, but it is slurm 17.11.0-0rc3 and not 17.02.9 as you have mentioned above.

giovtorres commented 6 years ago

You're right! I forgot to update the ENV after updating the supervisor.conf file. I pushed updated tags and it just finished building on docker hub. Give it a try now.

BerndDoser commented 6 years ago

The version is now the mentioned one, but slurmctld is still not running after boot. I think we need something like an event listener to ensure that slurmctld starts after slurmdbd is finished.

giovtorres commented 6 years ago

Yeah, I think so. I did something similar in a different project. I was looking for other ways and came across this, but it wasn't very encouraging.

Feel free to submit a PR. Otherwise, I'll look at the event listener docs over the next day or so.

Thanks.

giovtorres commented 6 years ago

I refactored the startup commands in ad4d01e. I tried it in both CentOS 7 and Ubuntu 16.04 and both start up properly. I think this way is better, as I don't have to depend on an arbitrary number of sleep seconds. Give it a try and let me know.

BerndDoser commented 6 years ago

Works perfectly! Thank you very much!