Scripts to turn existing instance into Slurm head node

Background

Jetstream2 will provide push-button clusters. Exosphere is one of the supported graphical user interfaces for Jetstream2. Exosphere is a 'pure client' application - this means that it requires no persistent services to interact with the OpenStack API. Currently the scripts in this repository will only work when executed on a computer/server that has the OpenStack command-line tools installed. This presents a problem for a tool like Exosphere, since the instance orchestration logic for Exosphere runs entirely in the browser. See the following issue on the Exosphere repository: https://gitlab.com/exosphere/exosphere/-/issues/636

The Exosphere developers considered the following three options:

A. Exosphere could do all this directly with the OpenStack API using the Exosphere orchestration engine
B. Exosphere could create a throw-away instance which runs cluster_create.sh to create the head node, and then deletes the throw-away instance
C. We could create a modified version of the cluster_create.sh script which runs on the head node itself, after it's been created by Exosphere

We decided to implement option C, with help from XSEDE Cyberinfrastructure Resource Integration staff. This pull request contains the resulting modifications in order to launch elastic Slurm clusters using Exosphere.

Some notable changes

New scripts called cluster_create_local.sh & install_local.sh (based on cluster_create.sh & install.sh) to run on an existing OpenStack instance, and will configure this instance as the head node of a new Slurm cluster
A new script called cluster_destroy_local.sh (based on cluster_destroy.sh) to run on an existing OpenStack server/Slurm head node, and will clean up any OpenStack resources created by the cluster_create_local.sh script (including the head node itself - optional, and disabled by default)
By default both cluster_create_local.sh and cluster_destroy_local.sh use the short host name of the head node for the cluster name and as the base name ($OS_PREFIX) for the OpenStack resources which they create/destroy
By default both cluster_create_local.sh and cluster_destroy_local.sh assume that the openrc.sh file lives in the user's home directory (~/openrc.sh) instead of the current directory
Because cluster_create_local.sh assumes that it's running on a head node that already exists, it:
- Does not create a new server to act as the cluster head node
- Does not create a new floating IP address, nor attaches it to the head node
- Drops the HEADNODE_SIZE flag
- Looks up the OpenStack instance UUID for the current instance using the OpenStack metadata service and uses that for any OpenStack operations involving the head node
- Mounts the (optional) storage volume to the current instance
- Runs install_local.sh on the current instance
- Installs the OpenStack command line tools
- Generates a new SSH key on the head node, and uses that to create a new OpenStack SSH public key (${OS_PREFIX}-elastic-key), instead of looking for an existing SSH key
- Instead of creating a temporary OpenStack application credential for Slurm to launch compute nodes and a new openrc.sh file to be copied to a new head node, it re-uses the credentials provided in the main openrc.sh file (Note: Exosphere generates this openrc.sh file from the same application credential that it uses when communicating with OpenStack, and injects it into a new head node at launch time using cloud-init)
Because cluster_destroy_local.sh assumes that it's running on a head node that already exists, it:
- Does not detach a floating IP address from the head node, nor deletes it (Note: This will leave a floating IP address behind which will have to be cleaned up manually)
- Does not delete a temporary app credential (because cluster_create_local.sh does not create one)
- Deletes ${OS_PREFIX}-elastic-key (because cluster_create_local.sh always generates a new SSH key pair on the head node instance, and creates a corresponding OpenStack SSH public key)

Note: These scripts are useful outside of the Exosphere client, and can be used from Horizon or any OpenStack client by adding the following snippet of shell script to the cloud-init of a new OpenStack instance:

su - centos -c "git clone --branch cluster-create-local --single-branch https://github.com/julianpistorius/CRI_Jetstream_Cluster.git; cd CRI_Jetstream_Cluster; ./cluster_create_local.sh"

To test this using Exosphere, go to https://exosphere.jetstream-cloud.org and follow these instructions: https://gitlab.com/exosphere/exosphere/-/merge_requests/587#how-to-test

Once this PR is merged we will change the Exosphere code to reference this repository instead of my fork, and the main branch instead of the cluster-create-local branch.

access-ci-org / Jetstream_Cluster

Scripts to turn existing instance into Slurm head node #6

Background

Some notable changes