charmed-hpc / slurm-snap

Snap package for Slurm. Slurm is a highly scalable cluster management and job scheduling system for large and small Linux clusters :balance_scale::penguin:
https://slurm.schedmd.com
Apache License 2.0
2 stars 3 forks source link

[Bug]: Setting and unsetting the `slurmd.config-server` option does not restore the environment #16

Open jedel1043 opened 3 months ago

jedel1043 commented 3 months ago

To reproduce

sudo snap set slurm slurmd.config-server=controller
sudo cat /var/snap/slurm/common/.env # SLURMD_CONFIG_SERVER is set
sudo snap unset slurm slurm.config-server
sudo cat /var/snap/slurm/common/.env # SLURMD_CONFIG_SERVER is still set and not removed
NucciTheBoss commented 3 months ago

Can reproduce. I consider this a bug since after some period of time a site may choose to directly use the file rather than use the config server.

NucciTheBoss commented 2 months ago

Setting config-server should unstage any set slurm.conf file that may have been pre-configured. For example, this could be a simple file rename like $SNAP_COMMON/etc/slurm/slurm.conf -> $SNAP_COMMON/etc/slurm/slurm.conf.bak. This way the slurm.conf configuration file in conf-cache/ will supersede the original configuration, but will preserve the original configuration should the administrator decide to unset config-server.

A warning message can be emitted if trying to set slurm.conf configurations when configless mode is enabled on slurmd that states that options will not be applied. Then, if config-server is unset, the configure hook could restore slurm.conf.bak

As for the .env file itself, the config-server option should be unset to indicate to slurmd wrapper that we're no longer running in configless mode and instead are running in traditional (or potentially dynamic) mode.