Azure / cyclecloud-slurm

Azure CycleCloud project to enable users to create, configure, and use Slurm HPC clusters.
MIT License
55 stars 42 forks source link

"azslurm scale" does not maintain original ownership of gres.conf and azure.conf #193

Closed thielen24 closed 8 months ago

thielen24 commented 8 months ago

In my environment, the umask is such that new files are not readable by "other" users. Combined with the fact that "azslurm scale" does not maintain original file ownership when regenerating gres.conf and azure.conf, they become unreadable by slurmctld and the service fails to restart at the end of the scale function.

From slurm/src/slurmcc/cli.py:

        with open(azure_conf + ".tmp", "w") as fw:
            _partitions(
                partition_dict,
                fw,
                allow_empty=False,
                autoscale=is_autoscale_enabled(),
            )

        logging.debug("Moving %s to %s", azure_conf + ".tmp", azure_conf)
        shutil.move(azure_conf + ".tmp", azure_conf)

        _update_future_states(node_mgr)

        with open(gres_conf + ".tmp", "w") as fw:
            _generate_gres_conf(partition_dict, fw)
        shutil.move(gres_conf + ".tmp", gres_conf)

One potential fix:

        with open(azure_conf + ".tmp", "w") as fw:
            _partitions(
                partition_dict,
                fw,
                allow_empty=False,
                autoscale=is_autoscale_enabled(),
            )
        st = os.stat(azure_conf)
        os.chown(azure_conf + ".tmp", st.st_uid, st.st_gid)
        logging.debug("Moving %s to %s", azure_conf + ".tmp", azure_conf)
        shutil.move(azure_conf + ".tmp", azure_conf)

        _update_future_states(node_mgr)

        with open(gres_conf + ".tmp", "w") as fw:
            _generate_gres_conf(partition_dict, fw)
        st = os.stat(gres_conf)
        os.chown(gres_conf + ".tmp", st.st_uid, st.st_gid)
        shutil.move(gres_conf + ".tmp", gres_conf)

Adding the chown requires azslurm to be run as root, but perhaps that was already a built-in assumption or requirement.