ComputeCanada / puppet-magic_castle

Puppet Environment repo for Magic Castle - https://github.com/ComputeCanada/magic_castle
MIT License
13 stars 21 forks source link

Viewing state of cloud nodes #378

Open ocaisa opened 2 months ago

ocaisa commented 2 months ago

If using a scalable cluster and the Terraform Cloud token expires, nodes become "unresponsive". With Slurm < 24, the state of cloud nodes is not visible unless you set

PrivateData=cloud

in your slurm.conf. As stated in https://support.schedmd.com/show_bug.cgi?id=2771 this is the exact opposite of what you expect when setting this and this has been fixed in Slurm 24.05 .

Is it possible to add this setting for Slurm < 24 ?

ocaisa commented 2 months ago

If you don't see that nodes are unresponsive, it's not easy to figure out why the cluster is not scaling