Closed percyfal closed 6 years ago
Hi @percyfal. I'll have a look. In the mean time, I have another project that splits up the different slurm components into their own container. It contains a script to add the cluster to the database (see README) and sacct
appears to be working as expected. You could give this a try.
Hi @percyfal , I refactored the supervisor config to get the start order right. When inside the container, I can run those commands without issue:
[root@ernie ~]# sacctmgr --immediate add cluster name=linux
Adding Cluster(s)
Name = linux
[root@ernie ~]#
[root@ernie ~]#
[root@ernie ~]# supervisorctl restart slurmdbd
slurmdbd: stopped
slurmdbd: started
[root@ernie ~]# supervisorctl restart slurmctld
slurmctld: stopped
slurmctld: started
[root@ernie ~]# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 5-00:00:00 5 unk c[1-5]
debug up 5-00:00:00 5 unk c[6-10]
[root@ernie ~]# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 5-00:00:00 5 idle c[1-5]
debug up 5-00:00:00 5 idle c[6-10]
[root@ernie ~]#
[root@ernie ~]# sacctmgr show cluster
Cluster ControlHost ControlPort RPC Share GrpJobs GrpTRES GrpSubmit MaxJobs MaxTRES MaxSubmit MaxWall QOS Def QOS
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- ---------
linux 127.0.0.1 6817 8192 1 normal
[root@ernie ~]#
[root@ernie ~]# sacctmgr add account none,test Cluster=linux Description="none" Organization="none"
Adding Account(s)
none
test
Settings
Description = none
Organization = none
Associations
A = none C = linux
A = test C = linux
Settings
Parent = root
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
[root@ernie ~]#
[root@ernie ~]# sacctmgr show account
Account Descr Org
---------- -------------------- --------------------
none none none
root default root account root
test none none
[root@ernie ~]#
Could you give it a try now?
Thanks.
Hi @giovtorres , I can confirm that it now works like a charm! Thanks!
/P
I'm using the slurm container for various tests and would like to monitor the status of jobs using the sacct command. I fire up the container:
docker run -it -h ernie giovtorres/docker-centos7-slurm:latest
and submit a simple job:scontrol works fine:
However, sacct fails since the table 'slurm_acct_db.linux_job_table' doesn't exist:
I cloned the repo and modified some settings in slurm.conf, to no avail. I have little experience setting up slurm so I'm unsure what changes need to be applied.
The issue has been reported before (e.g. http://thread.gmane.org/gmane.comp.distributed.slurm.devel/6333 and https://bugs.schedmd.com/show_bug.cgi?id=1943) and one proposed solution is adding the table with sacctmgr:
However, the first command hangs in the container.
Do you have any idea for a solution?
Cheers,
Per