Closed uvNikita closed 5 years ago
As a workaround, I see that it is possible to include the login node description in the node list, but not to include the node to any partition. After that slurmd
service starts successfully.
Sorry for the delay, I just come back from SC'17 where I attended the Slurm user group meeting.
Things should become easier with 17.11 to do what you wished as there will be a better split between the packages (slurm
, slurmd
, slurmctld
and slurmdbd
) allowing to better assign the packages depending on the roles of the nodes.
I'll anyhow have to rework the module to make it compliant for this specific version and these new packages -- I'll keep you updated when I'll have done that.
Sounds great! I'll use the workaround in the meantime. It's OK for me if you would prefer to close this issue or leave it open until it will be solved in version 17.11.
This module installs now by default slurm 17.11.3-2 with the above mentioned new split of the packages. It should be fine so I close this issue. Do not hesitate to reopen it if needed.
Thanks for working on the module!
I still don't see the way I can configure login nodes without running any daemons there. If I understand correctly, those nodes need only slurm::install
and slurm::config
classes, which are included only in classes that run other daemons as well (slurmd
, slurmctld
, slurmdbd
).
Hum actually we are running redundant login nodes (we call them access*
nodes) that just run the slurmd
daemon.
Here is an extract of the way we have them configured at the hiera level which is configured with the following hierarchy:
hierarchy:
#______________________
- name: "Per-node data" # Human-readable name.
path: "nodes/%{trusted.certname}.yaml" # File path, relative to datadir.
# ^^^^^ IMPORTANT: include the file extension!
#_________________________________________________
- name: "Site/Datacenter/Domain/Zone Specific data"
paths:
- "domain/%{facts.domain}.yaml"
- "site/%{facts.site}.yaml"
- "zone/%{facts.zone}.yaml"
#___________________________
- name: "Role Specific data"
path: "role/%{facts.role}.yaml"
#_____________________________________________
- name: "Sysadmins/DevOps/Research teams data"
path: "team/%{facts.team}.yaml"
#_________________________
- name: "OS Version Specific data" # Uses custom facts
path: "osrelease/%{facts.os.family}-%{facts.os.release.major}.yaml"
#_________________________
- name: "OS Specific data" # Uses custom facts
path: "osfamily/%{facts.os.family}.yaml"
#_____________________
- name: "Common data"
path: "common.yaml"
Then:
slurm::*
are set at the site level i.e. in site/<site>.yaml
roles/access.yaml
as follows: profiles:
- '::profile::access::<cluster>'
- '::profile::slurm::node'
slurm::manage_pam: true
slurm::service_manage: false
slurm::with_slurmdbd: false
slurm::with_slurmctld: false
slurm::with_slurmd: true
# Profiles key may be used to include profile classes
profiles:
- '::profile::slurm'
- '::profile::slurm::slurmctld'
slurm::service_manage: false
slurm::with_slurmd: false
slurm::with_slurmctld: true
slurm::with_slurmdbd: true
# Profiles key may be used to include profile classes
profiles:
- '::profile::slurm'
- '::profile::slurm::slurmctld'
slurm::service_manage: false
slurm::with_slurmd: false
slurm::with_slurmctld: true
slurm::with_slurmdbd: false
Yes, currently we have similar configuration when login nodes are running slurmd daemon.
But, as far as I understand, such nodes do not need to run slurmd daemon, they only require slurm.conf
and installed slurm package with binaries (srun
, squeue
etc). For instance, when I stop slurmd daemon on one of our login nodes with systemctl stop slurmd
, I can still execute sinfo
and srun commands.
In addition, by disabling slurmd daemon, we no longer need to specify these nodes in slurm.conf file since cluster doesn't need to know anything about them.
a new slurm::login
class is in preparation...
slurm::login
class tested in the new Vagrant setup bring a simplified example of profiles.
role=login
fact
profiles::slurm::login
access
VM deployed with vagrant -- see Vagrantfile#L124-L127
In our setup, we want to have login nodes that can initiate slurm commands (like
srun
orsbatch
) but which are not part of the compute cluster. Therefore, all they need to have is installed and configured slurm, without any daemons.If I understand correctly, it's not possible to achieve such configuration with this module currently. For instance, if we add
slurm::slurmd
class to the node specification, then slurmd will fail to start with the following error:fatal: Unable to determine this slurmd's NodeName
(since it's not described in the slurm.conf file), and simply includingslurm
class doesn't createslurm.conf
.Let me know please if I missed something or if you have any thoughts regarding this issue.