aws-samples / aws-eda-slurm-cluster

AWS Slurm Cluster for EDA Workloads
MIT No Attribution
28 stars 7 forks source link

[FEATURE] Add support for ParallelCluster 3.10.0 #242

Closed cartalla closed 2 months ago

cartalla commented 3 months ago

Is your feature request related to a problem? Please describe.

https://github.com/aws/aws-parallelcluster/releases/tag/v3.10.0

A key feature in this version is the support for an external slurmdbd daemon. Previously, each cluster had its own slurmdbd daemon running on the controller which is not the correct architecture. The shared database should have a single slurmdbd daemon that is used by all slurm cluster controllers that are sharing the accounting database.

The ParallelCluster adds a new config: ExternalSurmdbd.

A key change in this version is that the munge version was updated from 0.5.15 -> 0.5.16. This is critical because all of the clusters sharing an existing slurmdbd and all of the clusters being accessed from a login node such as a DCV desktop need to be running the same version of munge. This means that you can't share a slurm database and clusters that use version earlier than 3.10.0 with 3.10.0 or later.