Open vbosquier opened 3 years ago
Hi Vincent,
unfortunately there is no way to do it with the current version of clustermgtd. We would need to expose an option in the clustermgtd config file in order to exclude some nodes from the daemon management operations.
If you want to go ahead and patch the clustermgtd to be able to manually add additional nodes to Slurm, the required code change should be minimal. From a very quick look it seems like that removing the head node from the active_nodes, inactive_nodes
lists initialized here should be enough: https://github.com/aws/aws-parallelcluster-node/blob/develop/src/slurm_plugin/clustermgtd.py#L397.
As long as I set the master's node name to the parition-instancetype-[st/dy]-ordinal
format, I was able to add my master node to slurm.conf. In my case, that's master-t3large-dy-1
(st
didn't work for me actually, even though the node isn't actually dynamic). I also had to add this as a valid alias to /etc/hosts, and enable slurmd on master, and after that it worked as expected. Tested on 2.10.3.
Here are my postinstall actions relevant to this:
# generate a clustermgtd-compatible node name
master_type=$(curl http://169.254.169.254/latest/meta-data/instance-type)
master_name=$(echo "master-dy-$master_type-1" | tr -d '.')
master_memory=$(awk '/MemTotal/ { printf "%d \n", $2/1024 * 0.75 }' /proc/meminfo)
# do not use suspend/resume scripts for master node
echo "SuspendExcNodes=$master_name" >> /opt/slurm/etc/slurm.conf
# add definition for master node
echo "NodeName=$master_name CPUs=2 RealMemory=$master_memory State=UNKNOWN Feature=local,$master_type" >> /opt/slurm/etc/slurm.conf
# add partition for master node
echo "PartitionName=master Nodes=$master_name MaxTime=INFINITE DefMemPerCPU=2048 Default=NO" >> /opt/slurm/etc/slurm.conf
# add node name to hosts file
sed -i "$ s/$/ $master_name/" /etc/hosts
# enable slurmd service
cp /etc/chef/cookbooks/aws-parallelcluster/files/default/slurmd.service /etc/systemd/system
service slurmd start
service slurmctld restart
It would be good though to have a simple option in the pcluster config to include the master node as its own partition.
Hi @chambm ,
Thank you for suggesting the workaround. However, I would like to point out that this modification is NOT safe.
clustermgtd
will attempt to replace any node configured in slurm that is in a problematic state, so if head node somehow becomes DOWN
in slurm, or another problematic scenario happens, the head node instance will likely be terminated and this would break the cluster.
We would suggest not to configure head node in slurm at this time. If you want to make a workaround, please be sure that the head node is excluded from clustermgtd
actions, so it wouldn't be terminated by the daemon. We will continue to evaluate this feature request.
Thank you!
Hmm. Several times while I was working on getting the head node to work on SLURM, it was getting set to DOWN by clustermgtd, but it was never terminated. I definitely would have remembered that. :) But I haven't had it set to DOWN since I got it working. Did you test this workaround and had it terminate the head node?
Hi @chambm ,
Sorry I was mistaken, current logic in clustermgtd
only retrieves compute instances, so the head node instance cannot be terminated from clustermgtd
.
However, because of this, if NodeAddr
of the head node configured in slurm ever pointed to the real head node instance, clustermgtd
will set the node to DOWN because it cannot retrieve the head node instance, and thinks that there is no actual instance backing the head node configured in slurm
There are still some issues with the workaround that I would like to call out:
SuspendExcNodes
essentially makes it a static node because it's excluded from the normal Suspend logic. This will also interfere with any other static nodes in the system because slurm will just overwrite SuspendExcNodes
and remove other static nodes from that settingmaster-dy-$master_type-1
, makes the clustermgtd
not treat the node as a Static node either, which means there is no process that will terminate the node.NodeAddr
is not configured, the mapping between master-dy-$master_type-1
and the actual head node instance is never established. Maybe the node is able to get used by slurm as a Dynamic node, which will trigger the Resume program (SuspendExcNodes
does not prevent resume from running). However, in this case Resume program will launch a new instance to back master-dy-$master_type-1
. So essentially a new dynamic node instance is launched but the actual head node instance is never used by slurmWe would like to point out again that configuring head node in slurm not a supported path, and a workaround will most likely require some custom changes on the node package logic.
Thank you!
Hi ParallelCluster Dev Team!
With previous versions of ParallelCluster (up to 2.8.1), we used to configure an additional partition in Slurm for our Master node.
We are currently moving to PC2.10.1 with multiple compute queues support.
After we add the custom Nodename and Partition in a dedicated file that we include at the end of the slurm.conf file, we get the following issues:
"2021-01-20 15:13:42,304 - [slurm_plugin.clustermgtd:manage_cluster] - INFO - Retrieving nodes info from the scheduler 2021-01-20 15:13:43,372 - [slurm_plugin.clustermgtd:_get_node_info_from_partition] - ERROR - Failed when getting partition/node states from scheduler with exception 2021-01-20 15:13:43,373 - [slurm_plugin.clustermgtd:manage_cluster] - ERROR - Unable to get partition/node info from slurm, no other action can be performed. Sleeping..."
After investigation, the check fails because of the Master node. As soon as we remove the Master node and custom partition from the configuration and restart the slurm daemons, the heartbeat works fine again...
Can you help us find the appropriate way to have the Master Node in a custom partition AND maintain successful clustermgtd checks on the compute partitions?
Best regards, Vincent.