access-ci-org / Jetstream_Cluster

Scripts and Ansible Playbooks for building an HPC-style resource in Jetstream
MIT License
19 stars 16 forks source link

Ansible & Slurm Fixes #21

Closed zacharygraber closed 2 weeks ago

zacharygraber commented 2 weeks ago

Implements fixes for #19 and #20.

Changes

Ansible packages/version/collections

slurm.conf

Other (Ansible)


Testing

I've tested this on Jetstream2/Exosphere by using these steps. Please feel free to validate/replicate:

  1. In Exosphere, open the instance creation flow

  2. Select Featured-RockyLinux8 as the image

  3. Set Create your own SLURM cluster with this instance as the head node to yes

  4. In the Boot Script section, replace {create-cluster-command} inline with: su - rocky -c "git clone --branch rocky-linux --single-branch --depth 1 https://github.com/zacharygraber/Jetstream_Cluster.git; cd Jetstream_Cluster; ./cluster_create_local.sh -d 2>&1 | tee local_create.log"

    • This does the exact same thing Exosphere is doing to create the cluster normally, except it points to my fork instead of this repo.
  5. Wait for setup to complete, then SSH into

  6. sudo su - rocky

  7. cd Jetstream_Cluster

  8. Inspect the output in local_create.log to make sure everything looks alright.

  9. Run sbatch slurm_test.job, wait for it to finish, then verify the output.

julianpistorius commented 2 weeks ago

Works! Thank you @zacharygraber.