Azure / cyclecloud-slurm

Azure CycleCloud project to enable users to create, configure, and use Slurm HPC clusters.
MIT License
58 stars 43 forks source link

Bugfix/fix gres conf #139

Closed ryanhamel closed 1 year ago

ryanhamel commented 1 year ago

Fixes write location for gres.conf during azslurm scale, as well useful warnings if /etc/slurm/gres.conf is not a symlink or how to create the missing symlink.

ryanhamel commented 1 year ago

When i try to generate gres.conf from azslurm scale I get this:

[root@slurm302A-scheduler ~]# azslurm scale
WARNING: please run 'ln -s /sched/gres.conf /etc/slurm/gres.conf' && chown slurm:slurm /etc/slurm/gres.conf

Why can't we just add this here? and generate /etc/slurm/gres.conf in code?

As a rule, azlsurm is not supposed to be touching anything under /etc/slurm - it only recreates things in /sched/ - that way if a CX decides to use a completely different self-managed gres.conf, for example, they can handle it this way.

We set up the link properly during installation, but the goal of azslurm scale is not to repair those links so that a CX can manage those configuration files themselves. I did want to add a warning about how to repair it, at least though.