chwilk / puppet-slurm

Puppet module to manage SLURM clients
Apache License 2.0
0 stars 0 forks source link

slurm

Build Status

Table of Contents

  1. Overview
  2. Module Description - What the module does and why it is useful
  3. Setup - The basics of getting started with slurm
  4. Usage - Configuration options and additional functionality
  5. Reference - An under-the-hood peek at what the module is doing and how
  6. Limitations - OS compatibility, etc.
  7. Development - Guide for contributing to the module

Overview

The Slurm Workload Manager is an open source project designed to manage compute resources on computational clusters of various sizes.

Module Description

This module is intended for the automated provisioning and management of compute nodes in a RedHat/CentOS based SLURM cluster.

For simplicity, some key configuration decisions have been assumed, namely:

Setup

What slurm affects

Setup Requirements

This module requires SLURM packages be made available by repo to the local node, and a shared file system with the SLURM configuration file be mounted already.

Beginning with slurm

The slurm module requires some parameters to be functional.

The most basic example, using munge with a provided munge key:

class slurm: {
    munge_key_filename => '/shared/secret/munge.key',
    slurm_conf_location => '/shared/slurm/etc',
}

Alternatively, if sharing the munge key over NFS is undesireable, you could set it up on the puppet file server as documented here and then pass the puppet URI as the munge_key_filename.

Usage

The slurm module is intended to be modular for use in differently managed clusters. For instance:

We use something other than MUNGE to authenticate

class slurm: {
    disable_munge => true,
    slurm_conf_location => '/shared/slurm/etc',
}

We don't want users logging into compute nodes whether or not they have jobs there.

class slurm: {
    munge_key_filename => '/shared/secret/munge.key',
    slurm_conf_location => '/shared/slurm/etc',
    disable_pam => true,
}

Reference

The style of this module has been borrowed heavily from the puppetlabs-ntp module.

Classes

Public Classes

Private Classes

Parameters

Booleans

Some features of the slurm module can be turned on or off through the use of boolean switches:

disable_munge

Turns off all handling of munge keys or services. This may be used in case munge is to be handled separately, or if another authentication system is desired altogether.

Defaults to false

disable_pam

Turns off all editing of the PAM stack. PAM will no longer meter access by users running jobs.

Defaults to false

disable_slurmd

Turns off slurm daemon service (might be useful on login nodes, for instance)

Defaults to false

force_munge

Turns on the munged option --force which causes the munge server to attempt to run even if it is unhappy with its environment.

Defaults to false

package_manage

Turns off package installation, in case SLURM and/or MUNGE are to be handled in a different way.

Defaults to true

Strings

munge_key_filename

File or Puppet file server path to munge-key accessible by compute node.

munge_service_name

Which service to manage for munge.

Defaults to munge for most OS's

package_ensure

Set to 'present' by default, you could change this to 'latest' to force Puppet to automatically keep SLURM/MUNGE packages updated.

slurm_conf_location

Directory on compute node that contains the shared slurm.conf

Set to undef by default.

slurm_service_name

Which service to manage for the local slurmd daemon.

Varies based on distribution.

sysconfigdir

Where SLURM expects to find daemon config files on this distro.

Varies based on distribution.

Arrays

munge_packages

Set of packages to be maintained for munge.

pam_packages

Set of packages to be maintained for SLURM PAM integration.

slurm_packages

Set of packages to be maintained for SLURM itself.

Limitations

This module is being developed on Red Hat Enterprise Linux (RHEL) version 6. Contributions helping to port to other distributions or operating systems are welcome. I've tried to leave it in a state that will be considerate of porting efforts.

Development

I would be happy to review bug reports and pull requests via GitHub.

Release Notes/Contributors/Etc