galaxyproject / ansible-slurm

Ansible role for installing and managing the Slurm Workload Manager
84 stars 45 forks source link

Slurmdbd does not recognize the parameter SlurmctldPidFile #48

Open Oberfeldwedler opened 1 year ago

Oberfeldwedler commented 1 year ago

Hello there,

My Slurmdbd does not recognize the parameter SlurmctldPidFile. When i comment this out, it works: # SlurmctldPidFile: "{{ __slurm_run_dir ~ '/slurmdbd.pid' if __slurm_debian else omit }}"

I also could not find the used parameter in https://slurm.schedmd.com/slurmdbd.conf.html#OPT_PidFile.

I could however find the parameter ´PidFile´ but that did throw the same error.

Aug 02 18:37:06 slurmhost slurmdbd[15616]: error: _parse_next_key: Parsing error at unrecognized key: SlurmctldPidFile
Aug 02 18:37:06 slurmhost slurmdbd[15616]: fatal: Could not open/read/parse slurmdbd.conf file /etc/slurm/slurmdbd.conf

I would prefer to set the Location of my slurmdbd.pid file like i do with SlurmctldPidFile and SlurmdPidFile. Can this be accomplished?

I am running slurm-wlm 21.08.5.

/etc/slurm/slurmdbd.conf ```javascript root@slurmhost:/var/log/slurm# cat /etc/slurm/slurmdbd.conf ## ## This file is maintained by Ansible - ALL MODIFICATIONS WILL BE REVERTED ## ArchiveJobs=yes ArchiveSteps=yes AuthType=auth/munge DbdHost=slurmhost DbdPort=6819 DebugLevel=4 LogFile=/var/log/slurm/slurmdbd.log SlurmUser=slurm StorageHost=localhost StorageLoc=slurmdb StoragePass=CBB...........2B8 StoragePort=3306 StorageType=accounting_storage/mysql StorageUser=slurm ```
/etc/slurm/slurm.conf ```javascript root@slurmhost:/var/log/slurm# cat /etc/slurm/slurm.conf ## ## This file is maintained by Ansible - ALL MODIFICATIONS WILL BE REVERTED ## # Configuration options AccountingStorageEnforce=limits AccountingStorageHost=slurmhost AccountingStorageType=accounting_storage/slurmdbd AuthType=auth/munge ClusterName=ei-hpc-cluster CryptoType=crypto/munge GresTypes=gpu InactiveLimit=0 JobAcctGatherType=jobacct_gather/linux JobCompType=jobcomp/none KillWait=30 MinJobAge=300 MpiDefault=none PriorityDecayHalfLife=7-0 PriorityType=priority/multifactor PriorityWeightAge=1000 PriorityWeightFairshare=100000 PriorityWeightPartition=10000 ProctrackType=proctrack/pgid ReturnToService=2 SchedulerParameters=nohold_on_prolog_fail SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_Core SlurmctldDebug=5 SlurmctldHost=slurmhost SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmctldPidFile=/run/slurm/slurmctld.pid SlurmctldPort=6817 SlurmctldTimeout=300 SlurmdDebug=3 SlurmdLogFile=/var/log/slurm/slurmd.log SlurmdPidFile=/run/slurm/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurm/d SlurmdTimeout=300 SlurmUser=slurm StateSaveLocation=/var/spool/slurm/ctld SwitchType=switch/none Waittime=0 # Nodes NodeName=ei-srv-018 Boards=1 CoresPerSocket=16 CPUs=32 RealMemory=240000 SocketsPerBoard=2 State=UNKNOWN ThreadsPerCore=2 # Partitions PartitionName=normal Default=YES MaxTime=60 Nodes=ALL PriorityJobFactor=10000 State=UP PartitionName=day AllowAccounts=professor,mitarbeiter,student MaxTime=1440 Nodes=ALL PriorityJobFactor=6000 State=UP PartitionName=long AllowAccounts=professor,mitarbeiter,student MaxTime=10080 Nodes=ALL PriorityJobFactor=1000 State=UP PartitionName=priority AllowAccounts=admin MaxTime=UNLIMITED Nodes=ALL PriorityJobFactor=5000 State=UP ```
cat-bro commented 1 year ago

We have the same issue. As a workaround we are overriding __slurmdbd_config_default for the time being https://github.com/usegalaxy-au/infrastructure/blob/master/group_vars/all.yml#L267-L272