giovtorres / docker-centos7-slurm

Slurm Docker Container on CentOS 7
MIT License
87 stars 57 forks source link

Update dockerhub to match README #50

Closed asmacdo closed 11 months ago

asmacdo commented 11 months ago

Using dockerhub readme I couldn't get the cluster to start.

README (Works great, thanks!): docker run -it -h slurmctl --cap-add sys_admin giovtorres/docker-centos7-slurm:latest

Dockerhub (Failure output below): docker run -it -h ernie giovtorres/docker-centos7-slurm:latest

This fails, I suppose the hostname must be slurmctl.

- Initializing database
- Database initialized
- Starting MariaDB to create Slurm account database
231031 13:57:20 mysqld_safe Logging to '/var/log/mariadb/mariadb.log'.
231031 13:57:20 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
- Starting MariaDB to create Slurm account database
- Creating Slurm acct database
- Slurm acct database created. Stopping MariaDB
- Starting supervisord process manager
- Starting munged
munged: started
- munged is in the RUNNING state.
- Starting mysqld
mysqld: started
- mysqld is in the RUNNING state.
- Starting slurmdbd
slurmdbd: started
- slurmdbd is in the RUNNING state.
- Starting slurmctld
slurmctld: ERROR (spawn error)
- slurmctld is in the BACKOFF state.
- slurmctld is in the STARTING state.
- slurmctld is in the BACKOFF state.
- slurmctld is in the BACKOFF state.
- slurmctld is in the STARTING state.
- slurmctld is in the FATAL state.
- slurmctld is in the FATAL state.
- slurmctld is in the FATAL state.
- slurmctld is in the FATAL state.
- slurmctld is in the FATAL state.
- slurmctld is in the FATAL state.
- Starting slurmd
slurmd: ERROR (spawn error)
- slurmd is in the BACKOFF state.
- slurmd is in the STARTING state.
- slurmd is in the BACKOFF state.
- slurmd is in the BACKOFF state.
- slurmd is in the STARTING state.
- slurmd is in the FATAL state.
- slurmd is in the FATAL state.
- slurmd is in the FATAL state.
- slurmd is in the FATAL state.
- slurmd is in the FATAL state.
- slurmd is in the FATAL state.
- Port 6817 is not listening
- Port 6817 is not listening
- Port 6817 is not listening
- Port 6817 is not listening
- Port 6817 is not listening
- Port 6817 is not listening
- Port 6817 is not listening
- Port 6817 is not listening
- Port 6817 is not listening
- Port 6817 is not listening
- Port 6817 is not listening
- Port 6818 is not listening
- Port 6818 is not listening
- Port 6818 is not listening
- Port 6818 is not listening
- Port 6818 is not listening
- Port 6818 is not listening
- Port 6818 is not listening
- Port 6818 is not listening
- Port 6818 is not listening
- Port 6818 is not listening
- Port 6818 is not listening
- Port 6819 is listening
- Waiting for the cluster to become available
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory
sinfo: error: get_addr_info: getaddrinfo() failed: Name or service not known
sinfo: error: slurm_set_addr: Unable to resolve "slurmctl"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: No such file or directory

Slurm partitions failed to start successfully.
giovtorres commented 11 months ago

Updated! Thank you. I haven't done a good job at keeping the integration up to date.

yarikoptic commented 11 months ago

I see it adjusted, e.g. image so I guess could be closed (by @giovtorres or @asmacdo -- I have no powers)