giovtorres / docker-centos7-slurm

Slurm Docker Container on CentOS 7
MIT License
85 stars 56 forks source link

slurm 22.05 support and multiple-slurmd #49

Closed tazend closed 8 months ago

tazend commented 1 year ago

Version updates

Multiple node support

Compiles slurm with --multiple-slurmd. Without this option, you cannot correctly operate multiple nodes on a single host. Multiple nodes were previously configured in the slurm.conf, but this was always yielding lots of errors (even though it appeared to be working when requesting the nodes). But doing something like srun in a multi-node job would always fail - which now works correctly with --multiple-slurmd.

Therefore supervisord.conf has also been adapted to start, by default, 3 instances of slurmds.

Changes is the entrypoint file

Other changes

gres.conf and the GPUs attached to the nodes have been removed for now - same reason as above with the multiple nodes - You can request it, but it will probably yield some errors, since there is no GPU device file available (maybe need to do some more testing and perhaps could add it back later)

cgroup.conf has been added, with explicity setting the cgroup version to be used to v1. Without it, slurm did not want to start.

Also enable config_overrides in the slurm.conf for the slurmd, so we can declare an arbitrary amount of resources for the node, e.g 56 CPUs and 512GB of RAM, even though it isn't physically available.

yarikoptic commented 8 months ago

Also enable config_overrides in the slurm.conf for the slurmd, so we can declare an arbitrary amount of resources for the node, e.g 56 CPUs and 512GB of RAM, even though it isn't physically available.

@tazend thank you for this PR! We are testing it for our use case. Would you be so kind to also add to README.md some instructions on how to make use of parametrisations you have added and to provide those config_overrides when needed?

giovtorres commented 8 months ago

@tazend would you mind updating the README in a follow up PR? Thanks!

asmacdo commented 8 months ago

@tazend I'm happy to give an unfamiliar-eyes-review for a README update, I'm excited to try out config_overrides :)

tazend commented 8 months ago

Hi @giovtorres,

yeah I will try to update the README soon