The Slurm Workload Manager is an open source project designed to manage compute resources on computational clusters of various sizes.
This module is intended for the automated provisioning and management of compute nodes in a RedHat/CentOS based SLURM cluster.
For simplicity, some key configuration decisions have been assumed, namely:
This module requires SLURM packages be made available by repo to the local node, and a shared file system with the SLURM configuration file be mounted already.
The slurm module requires some parameters to be functional.
The most basic example, using munge with a provided munge key:
class slurm: {
munge_key_filename => '/shared/secret/munge.key',
slurm_conf_location => '/shared/slurm/etc',
}
Alternatively, if sharing the munge key over NFS is undesireable, you could set it up on the puppet file server as documented here
and then pass the puppet URI as the munge_key_filename
.
The slurm module is intended to be modular for use in differently managed clusters. For instance:
class slurm: {
disable_munge => true,
slurm_conf_location => '/shared/slurm/etc',
}
class slurm: {
munge_key_filename => '/shared/secret/munge.key',
slurm_conf_location => '/shared/slurm/etc',
disable_pam => true,
}
The style of this module has been borrowed heavily from the puppetlabs-ntp module.
slurm::install: Handles Package resources
slurm::config: Handles editing configuration files, symbolic link of /etc/slurm, and the munge key.
slurm::service: Handles slurmd and munge services
Some features of the slurm module can be turned on or off through the use of boolean switches:
disable_munge
Turns off all handling of munge keys or services. This may be used in case munge is to be handled separately, or if another authentication system is desired altogether.
Defaults to false
disable_pam
Turns off all editing of the PAM stack. PAM will no longer meter access by users running jobs.
Defaults to false
disable_slurmd
Turns off slurm daemon service (might be useful on login nodes, for instance)
Defaults to false
force_munge
Turns on the munged option --force which causes the munge server to attempt to run even if it is unhappy with its environment.
Defaults to false
package_manage
Turns off package installation, in case SLURM and/or MUNGE are to be handled in a different way.
Defaults to true
munge_key_filename
File or Puppet file server path to munge-key accessible by compute node.
munge_service_name
Which service to manage for munge.
Defaults to munge
for most OS's
package_ensure
Set to 'present'
by default, you could change this to 'latest'
to force Puppet to automatically keep SLURM/MUNGE packages updated.
slurm_conf_location
Directory on compute node that contains the shared slurm.conf
Set to undef
by default.
slurm_service_name
Which service to manage for the local slurmd daemon.
Varies based on distribution.
sysconfigdir
Where SLURM expects to find daemon config files on this distro.
Varies based on distribution.
munge_packages
Set of packages to be maintained for munge.
pam_packages
Set of packages to be maintained for SLURM PAM integration.
slurm_packages
Set of packages to be maintained for SLURM itself.
This module is being developed on Red Hat Enterprise Linux (RHEL) version 6. Contributions helping to port to other distributions or operating systems are welcome. I've tried to leave it in a state that will be considerate of porting efforts.
I would be happy to review bug reports and pull requests via GitHub.
Fri Feb 5 2016 Chandler Wilkerson chwilk@gmail.com 0.1.2
Updating disable_slurmd
parameter to ensure_slurmd
Wed Feb 3 2016 Chandler Wilkerson chwilk@gmail.com 0.1.1
Added disable_slurmd
parameter
Mon Feb 1 2016 Chandler Wilkerson chwilk@gmail.com 0.1.0
Packaged and uploaded 0.1.0 release to Puppet Forge