aws-samples / aws-eda-slurm-cluster

AWS Slurm Cluster for EDA Workloads
MIT No Attribution
28 stars 7 forks source link

[FEATURE] Install slurm utilities in NFS area so all machines can see it, no need to recompile for every machine. #233

Closed gwolski closed 4 months ago

gwolski commented 4 months ago

I haven't studied the following completely by reviewing the code, so apologies in advance if it exists.

When I run the ansible playbook to install on my workstation machines so they can all submit to the HeadNode, slurm is being recompiled and installed locally every time.

Is there a way for me to just install the utilities on an NFS mounted area, say an NFS mounted /usr/local/slurm area, and then just reference that? Same for the config files that might be used to tell slurm where/who the HeadNode is?

Or is there some reason for this requirement?

cartalla commented 4 months ago

Let me test this, but it should be storing the compiled binaries on the Slurm head node's NFS export so that all instances can see it. So, it should only need to be compiled once per OS distribution and architecture.

Let me test and make sure that it is detecting that it has already been done.

gwolski commented 4 months ago

As noted, I haven't dug into this, but I do see that the slurm commands are on the mounted head_node..pcluster:/opt/slurm

I have only installed on one "user workstation" so it might be ok and doing the right thing.

cartalla commented 4 months ago

The slurm binaries are only compiles on the submitter if they haven't previously been compiled for the OS and architecture of the submitter. They are compiled locally and then installed at /opt/slurm/ClusterName/config/os/... which is on the cluster's head node.

If you run the configuration script again it will run the ansible playbook, but it won't recompile the binaries because they already exist.