This repo contains the Puppet environment and the classes that are used to define the roles of the instances in a Magic Castle cluster.
Roles are attributed to instance based on their tags. For each tag, a list of
classes to include is define. This mechanism is explained in section
magic_castle::site
.
The parameters of the classes can be customized by defined values in the hieradata.
The profile::
sections list the available classes, their role and their parameters.
profile::accounts
profile::base
profile::base::azure
profile::base::etc_hosts
profile::base::powertools
profile::ceph::client
profile::consul
profile::consul::puppet_watch
profile::cvmfs::client
profile::cvmfs::local_user
profile::cvmfs::alien_cache
profile::efa
profile::fail2ban
profile::freeipa
profile::freeipa::base
profile::freeipa::client
profile::freeipa::server
profile::freeipa::mokey
profile::gpu
profile::jupyterhub::hub
profile::jupyterhub::node
profile::metrics::node_exporter
profile::metrics::slurm_job_exporter
profile::metrics::slurm_exporter
profile::nfs
profile::nfs::client
profile::nfs::server
profile::reverse_proxy
profile::rsyslog::base
profile::rsyslog::client
profile::rsyslog::server
profile::vector
profile::slurm::base
profile::slurm::accounting
profile::slurm::controller
profile::slurm::node
profile::software_stack
profile::squid::server
profile::sssd::client
profile::ssh::base
profile::ssh::known_hosts
profile::ssh::hostbased_auth::client
profile::ssh::hostbased_auth::server
profile::users::ldap
profile::users::local
profile::volumes
For classes with parameters, a folded default values subsection provides the default
value of each parameter as it would be defined in hieradata. For some parameters, the value is
displayed as ENC[PKCS7,...]
. This corresponds to an encrypted random value generated by
bootstrap.sh
on the Puppet server initial boot. These values are stored in
/etc/puppetlabs/code/environment/data/bootstrap.yaml
- a file also created on Puppet server
initial boot.
magic_castle::site
Variable | Description | Type |
---|---|---|
all |
List of classes that are included by all instances | Array[String] |
tags |
Mapping tag-classes - instances that have the tag include the classes | Hash[Array[String]] |
enable_chaos |
Shuffle class inclusion order - used for debugging purposes | Boolean |
profile::accounts
This class configures two services to bridge LDAP users, Slurm accounts and users' folders in filesystems. The services are:
mkhome
: monitor new uid entries in slapd access logs and create their corresponding /home and optionally /scratch folders.mkproject
: monitor new gid entries in slapd access logs and create their corresponding /project folders and Slurm accounts if it matches the project regex.Variable | Description | Type |
---|---|---|
project_regex |
Regex identifying FreeIPA groups that require a corresponding Slurm account | String |
skel_archives |
Archives extracted in each FreeIPA user's home when created | Array[Struct[{filename => String[1], source => String[1]}]] |
manage_home |
When true, mkhome create home folder for new FreeIPA users |
Boolean |
manage_scratch |
When true, mkhome create scratch folder for new FreeIPA users |
Boolean |
manage_project |
When true, mkproject create project folder for new FreeIPA users |
Boolean |
This class works at its full potential if these classes are also included:
profile::base
This class install packages, creates files and install services that have yet justified the creation of a class of their own but are very useful to Magic Castle cluster operations.
Variable | Description | Type |
---|---|---|
version |
Current version number of Magic Castle | String |
admin_email |
Email of the cluster administrator, use to send log and report cluster related issues | String |
packages |
List of additional OS packages that should be installed | Array[String] |
When profile::base
is included, these classes are included too:
epel
selinux
stdlib
profile::base::azure
(only when running in Microsoft Azure Cloud)profile::base::etc_hosts
profile::base::powertools
profile::ssh::base
profile::mail::server
(when parameter admin_email
is defined)profile::base::azure
This class ensures Microsoft Azure Linux Guest Agent is not installed as it tends to interfere with Magic Castle configuration. The class also install Azure udev storage rules that would normally be provided by the Linux Guest Agent.
profile::base::etc_hosts
This class ensures that each instance declared in Magic Castle main.tf
have an entry
in /etc/hosts
. The ip addresses, fqdns and short hostnames are taken from the terraform.instances
datastructure provided by /etc/puppetlabs/data/terraform_data.yaml
.
profile::base::powertools
This class ensures the DNF Powertools repo is enabled when using EL8. For all other EL versions, this class does nothing.
profile::ceph::client
Ceph is a free and open-source software-defined storage platform that provides object storage, block storage, and file storage built on a common distributed cluster foundation. reference
This class installs the Ceph packages, and configure and mount CephFS shares.
Variable | Description | Type |
---|---|---|
mon_host |
List of Ceph monitor hostnames | Array[String] |
shares |
List of Ceph share structures | Hash[String, CephFS] |
This class only installs the Ceph packages.
profile::consul
Consul is a service networking platform developed by HashiCorp. reference
This class install consul and configure the service. An instance becomes a
Consul server agent
if its local ip address is declared in profile::consul::servers
. Otherwise, it becomes a
Consul client agent.
Variable | Description | Type |
---|---|---|
servers |
IP addresses of the consul servers | Array[String] |
When profile::consul
is included, these classes are included too:
profile::consul::puppet_watch
This class configure a consul watch event that when triggered restart the Puppet agent. It is used mainly by Terraform to restart all Puppet agents across the cluster when the hieradata source files uploaded by Terraform are updated.
When profile::consul::puppet_watch
is included, this class is included too:
profile::cvmfs::client
The CernVM File System (CVMFS) provides a scalable, reliable and low-maintenance software distribution service. It was developed to assist High Energy Physics (HEP) collaborations to deploy software on the worldwide-distributed computing infrastructure used to run data processing applications. CernVM-FS is implemented as a POSIX read-only file system in user space (a FUSE module). Files and directories are hosted on standard web servers and mounted in the universal namespace
/cvmfs
. reference
This class installs CVMFS client and configure repositories.
Variable | Description | Type |
---|---|---|
quota_limit |
Instance local cache directory soft quota (MB) | Integer |
strict_mount |
If true, mount only repositories that are listed repositories |
Boolean |
repositories |
Fully qualified repository names to include in use of utilities such as cvmfs_config |
Array[String] |
alien_cache_repositories |
List of repositories that require an alien cache | Array[String] |
When profile::cvmfs::client
is included, these classes are included too:
profile::cvmfs::local_user
This class configures a cvmfs
local user.
This guarantees a consistent UID and GID for user cvmfs across
the cluster when using CVMFS Alien Cache.
Variable | Description | Type |
---|---|---|
cvmfs_uid |
cvmfs user id | Integer |
cvmfs_gid |
cvmfs group id | Integer |
cvmfs_group |
cvmfs group name | String |
profile::cvmfs::alien_cache
This class determines the location of the CVMFS alien cache.
Variable | Description | Type |
---|---|---|
alien_fs_root |
Shared file system where the alien cache will be create | String |
alien_folder_name |
Alien cache folder name | String |
profile::efa
This class installs the Elastic Fabric Adapter drivers on an AWS instance with an EFA network interface. reference
Variable | Description | Type |
---|---|---|
version |
EFA driver version | String |
profile::fail2ban
Fail2ban is an intrusion prevention software framework. Written in the Python programming language, it is designed to prevent brute-force attacks. reference
This class installs and configures fail2ban.
Variable | Description | Type |
---|---|---|
ignoreip |
List of IP addresses that can never be banned (compatible with CIDR notation) | Array[String] |
Refer to puppet-fail2ban for more parameters to configure.
When profile::fail2ban
is included, these classes are included too:
profile::freeipa
FreeIPA is a free and open source identity management system. FreeIPA is the upstream open-source project for Red Hat Identity Management. reference
This class configures either the instance as a FreeIPA client or a server based on the value
of profile::freeipa::client::server_ip
. If this value matches the instance local IP address, the
server class is included - profile::freeipa::server
, otherwise the
client class is included - profile::freeipa::client
.
When profile::freeipa
is included, theses classes can be included too:
profile::freeipa::base
This class configures files and services that are common to FreeIPA client and FreeIPA server.
Variable | Description | Type |
---|---|---|
domain_name |
FreeIPA primary domain | String |
profile::freeipa::client
This class install packages, and configures files and services of a FreeIPA client.
Variable | Description | Type |
---|---|---|
server_ip |
FreeIPA server ip address | String |
profile::freeipa::server
This class configures files and services of a FreeIPA server.
Variable | Description | Type |
---|---|---|
id_start |
Starting user and group id number | Integer |
admin_password |
Password of the FreeIPA admin account | String |
ds_password |
Password of the directory server | String |
hbac_services |
Name of services to control with HBAC rules | Array[String] |
profile::freeipa::mokey
mokey is web application that provides self-service user account management tools for FreeIPA. reference
This class installs mokey, configures its files and manage its service.
Variable | Description | Type |
---|---|---|
password |
Password of Mokey table in MariaDB | String |
port |
Mokey internal web server port | Integer |
enable_user_signup |
Allow users to create an account on the cluster | Boolean |
require_verify_admin |
Require a FreeIPA to enable Mokey created account before usage | Boolean |
access_tags |
HBAC rule access tags for users created via mokey self-signup | Array[String] |
profile::gpu
This class installs and configures the NVIDIA GPU drivers if an NVIDIA GPU is detected. The class configures nvidia-persistenced and nvidia-dcgm daemons when the GPU is connected via PCI passthrough, or configures nvidia-gridd when dealing with an NVIDIA VGPU.
For PCI passthrough, the class installs the latest CUDA drivers available
on NVIDIA yum repos.
For VGPU, the driver source is cloud provider specific and has to be specified
via either profile::gpu::install::vgpu::rpm::source
for rpms or
profile::gpu::install::vgpu::bin::source
for binary installer.
profile::jupyterhub::hub
JupyterHub is a multi-user server for Jupyter Notebooks. It is designed to support many users by spawning, managing, and proxying many singular Jupyter Notebook servers. reference
This class installs and configure the hub part of JupyterHub.
Variable | Description | Type |
---|---|---|
register_url |
URL that links to register page. Empty string means no visible link. | String |
reset_pw_url |
URL that links to reset password page. Empty string means no visible link. | String |
When profile::jupyterhub::hub
is included, this class is included too:
profile::jupyterhub::node
This class installs and configure the single-user notebook part of JupyterHub.
When profile::jupyterhub::node
is included, these classes are included too:
profile::metrics::node_exporter
Prometheus is a free software application used for event monitoring and alerting. It records metrics in a time series database built using an HTTP pull model, with flexible queries and real-time alerting. reference
This class configures a Prometheus exporter that exports server usage metrics, for example CPU and memory usage. It should be included on every instances of the cluster.
When profile::metrics::node_exporter
is included, these classes are included too:
profile::metrics::slurm_job_exporter
This class configures a Prometheus exporter that exports the Slurm compute node metrics, for example:
This exporter needs to run on compute nodes.
Variable | Description | Type |
---|---|---|
version |
The version of the slurm job exporter to install | String |
When profile::metrics::slurm_job_exporter
is included, this class is included too:
[profile::consul
](#profileconsul)profile::metrics::slurm_exporter
This class configures a Prometheus exporter that exports the Slurm scheduling metrics, for example:
This exporter typically runs on the Slurm controller server, but it can run on any server with a functional Slurm command-line installation.
profile::nfs
Network File System (NFS) is a distributed file system protocol [...] allowing a user on a client computer to access files over a computer network much like local storage is accessed. reference
This class instantiates either an NFS client or an NFS server.
If profile::nfs::client::server_ip
matches the instance's local ip address, the
server class is included - profile::nfs::server
, otherwise the
client class is included - profile::nfs::client
.
profile::nfs::client
This class install NFS and configure the client to mount all shares exported by a single NFS server identified via its ip address.
Variable | Description | Type |
---|---|---|
server_ip |
IP address of the NFS server | String |
When profile::nfs::client
is included, these classes are included too:
client_enabled => true
)profile::nfs::server
This class install NFS and configure an NFS server that will export all volumes tagged as nfs
.
Variable | Description | Type |
---|---|---|
no_root_squash_tags |
Array of tags identifying instances that can mount NFS exports without root squash | Array[String] |
When profile::nfs::server
is included, these classes are included too:
server_enabled => true
)profile::reverse_proxy
Caddy is an extensible, cross-platform, open-source web server written in Go. [...] It is best known for its automatic HTTPS features. reference
This class installs and configure Caddy as a reverse proxy to expose Magic Castle cluster internal services to the Internet.
Variable | Description | Type |
---|---|---|
domain_name |
Domain name corresponding to the main DNS record A registered | String |
main2sub_redir |
Subdomain to redirect to when hitting domain name directly. Empty means no redirect. | String |
subdomains |
Subdomain names used to create vhosts to internal http endpoints | Hash[String, String] |
remote_ips |
List of allowed ip addresses per subdomain. Undef mean no restrictions. | Hash[String, Array[String]] |
profile::rsyslog::base
Rsyslog is an open-source software utility used on UNIX and Unix-like computer systems for forwarding log messages in an IP network. reference
This class installs rsyslog and launch the service.
profile::rsyslog::client
This class install and configures rsyslog service to forward the instance's logs to rsyslog servers. The rsyslog servers are discovered by the instance via Consul.
When profile::rsyslog::client
is included, these classes are included too:
profile::rsyslog::server
This class install and configures rsyslog service to receives forwarded logs from all rsyslog client in the cluster.
When profile::rsyslog::server
is included, these classes are included too:
profile::vector
This class install and configures vector.dev service to manage logs. Refer to the documentation for configuration.
Variable | Description | Type | Optional ? |
---|---|---|---|
config |
Content of the yaml configuration file | String | Yes |
profile::slurm::base
The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management, or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters. reference
MUNGE (MUNGE Uid 'N' Gid Emporium) is an authentication service for creating and validating credentials. It is designed to be highly scalable for use in an HPC cluster environment. reference
This class installs base packages and config files that are essential to all Slurm's roles. It also installs and configure Munge service.
Variable | Description | Type |
---|---|---|
cluster_name |
Name of the cluster | String |
munge_key |
Base64 encoded Munge key | String |
slurm_version |
Slurm version to install | Enum['23.02', '23.11', '24.05'] |
os_reserved_memory |
Memory in MB reserved for the operating system on the compute nodes | Integer |
suspend_time |
Idle time (seconds) for nodes to becomes eligible for suspension. | Integer |
resume_timeout |
Maximum time permitted (seconds) between a node resume request and its availability. | Integer |
force_slurm_in_path |
Enable Slurm's bin path in all users (local and LDAP) PATH environment variable | Boolean |
enable_scrontab |
Enable user's Slurm-managed crontab | Boolean |
enable_x11_forwarding |
Enable Slurm's built-in X11 forwarding capabilities | Boolean |
config_addendum |
Additional parameters included at the end of slurm.conf. | String |
When profile::slurm::base
is included, these classes are included too:
profile::slurm::accounting
This class installs and configure the Slurm database daemon - slurmdbd. This class also installs and configures MariaDB for slurmdbd to store its tables.
Variable | Description | Type |
---|---|---|
password |
Password used by for SlurmDBD to connect to MariaDB | String |
admins |
List of Slurm administrator usernames | Array[String] |
accounts |
Define Slurm account name and specifications | Hash[String, Hash] |
users |
Define association between usernames and accounts | Hash[String, Array[String]] |
options |
Define additional cluster's global Slurm accounting options | Hash[String, Any] |
dbd_port |
SlurmDBD service listening port | Integer |
When profile::slurm::accounting
is included, these classes are included too:
profile::slurm::controller
This class installs and configure the Slurm controller daemon - slurmctld.
Variable | Description | Type |
---|---|---|
autoscale_version |
Version of Slurm Terraform cloud autoscale software to install | String |
tfe_token |
Terraform Cloud API Token. Required to enable autoscaling. | String |
tfe_workspace |
Terraform Cloud workspace id. Required to enable autoscaling. | String |
tfe_var_pool |
Variable name in Terraform Cloud workspace to control autoscaling pool | String |
selinux_context |
SELinux context for jobs (Slurm > 20.11) | String |
When profile::slurm::accounting
is included, these classes are included too:
profile::slurm::node
This class installs and configure the Slurm node daemon - slurmd.
Variable | Description | Type |
---|---|---|
enable_tmpfs_mounts |
Enable spank-cc-tmpfs_mounts plugin | Boolean |
When profile::slurm::node
is included, this class is included too:
profile::software_stack
This class configures the initial shell profile that user will load on login and
the default set of Lmod modules that will be loaded. The software stack selected
depends on the Puppet fact software_stack
which is set by Magic Castle Terraform
variable software_stack
.
Variable | Description | Type |
---|---|---|
min_uid |
Mininum UID value required to load the software environment init script on login | Integer |
initial_profile |
Path to shell script initializing software environment variables | String |
extra_site_env_vars |
Map of environment variables that will be exported before sourcing profile shell scripts. | Hash[String, String] |
lmod_default_modules |
List of lmod default modules | Array[String] |
When profile::software_stack
is included, these classes are included too:
profile::squid::server
Squid is a caching and forwarding HTTP web proxy. It has a wide variety of uses, including speeding up a web server by caching repeated requests reference
This class configures and installs the Squid service. Its main usage is to act as an HTTP cache for CVMFS clients in the cluster.
Variable | Description | Type |
---|---|---|
port |
Squid service listening port | Integer |
cache_size |
Amount of disk space (MB) | Integer |
cvmfs_acl_regex |
List of allowed CVMFS stratums as regexes | Array[String] |
When profile::squid::server
is included, these classes are included too:
profile::sssd::client
The System Security Services Daemon is software originally developed for the Linux operating system that provides a set of daemons to manage access to remote directory services and authentication mechanisms. reference
This class configures external authentication domains
Variable | Description | Type |
---|---|---|
domains |
Config dictionary of domains that can authenticate | Hash[String, Any] |
access_tags |
List of host tags that domain user can connect to | Array[String] |
deny_access |
Deny access to the domains on the host including this class, if undef, the access is defined by tags. | Optional[Boolean] |
profile::ssh::base
This class optimizer ssh server daemon (sshd) configuration to achieve an A+ audit score on https://www.sshaudit.com/.
profile::ssh::known_hosts
This class populates the file /etc/ssh/ssh_known_hosts
with the cluster's instance ed25519
hostkeys using data provided by Terraform.
profile::ssh::hostbased_auth::client
This class allows instances to connect with SSH to instances including
profile::ssh::hostbased_auth::server
using SSH hostbased authentication.
profile::ssh::hostbased_auth::server
This class enables SSH hostbased authentication on the instance including it.
Variable | Description | Type |
---|---|---|
shosts_tags |
Tags of instances that can connect this server using hostbased authentication | Array[String] |
When profile::ssh::hostbased_auth::server
is included, this class is included too:
profile::users::ldap
This class allows the definition of FreeIPA users directly in YAML. The alternatives being to use FreeIPA command-line, to use the FreeIPA web interface or to use Mokey.
Variable | Description | Type |
---|---|---|
users |
Dictionary of users to be created in LDAP | Hash[profile::users::ldap_user] |
access_tags |
List of 'tag:service' that LDAP user can connect to |
Array[String] |
A profile::users::ldap_user is defined as a dictionary with the following keys: |
Variable | Description | Type | Optional ? |
---|---|---|---|---|
groups |
List of groups the user has to be part of | Array[String] | No | |
public_keys |
List of ssh authorized keys for the user | Array[String] | Yes | |
passwd |
User's password | String | Yes | |
manage_password |
If enable, agents verify the password hashes match | Boolean | Yes |
By default, Puppet will manage the LDAP user(s) password and change it in LDAP if its hash no
longer match to what is prescribed in YAML. To disable this feature, add
manage_password: false
to the user(s) definition.
profile::users::local
This class allows the definition of local users outside of FreeIPA realm.
Local user's home is local to the machine where it is created and can be
found at the root of the filesytem i.e.: /username
. Local users are the
only type of users in Magic Castle allowed to be sudoers.
Variable | Description | Type |
---|---|---|
users |
Dictionary of users to be created locally | Hash[profile::users::local_user] |
A profile::users::local_user is defined as a dictionary with the following keys: |
Variable | Description | Type | Optional ? |
---|---|---|---|---|
groups |
List of groups the user has to be part of | Array[String] | No | |
public_keys |
List of ssh authorized keys for the user | Array[String] | No | |
sudoer |
If enable, the user can sudo without password | Boolean | Yes | |
selinux_user |
SELinux context for the user | String | Yes | |
mls_range |
MLS Range for the user | String | Yes | |
authenticationmethods |
Specifies AuthenticationMethods value for this user in sshd_config | String | Yes |
profile::volumes
This class creates and mounts LVM volume groups. Each volume is formated as XFS.
If a volume is expanded after the initial configuration, the class will not expand the LVM volume automatically. These operations currently have to be accomplished manually.
Variable | Description | Type |
---|---|---|
devices |
Hash of devices | Hash[String, Hash[String, Hash]] |