On Slurm clusters, this line in the palace script generates an error because NODE_LIST=$SLURM_JOB_NODELIST lists node names and is not a file (like PBS_NODEFILE):
For example, with two allocated nodes named d05-41 and d05-42, SLURM_JOB_NODELIST=d05-[41-42] and the error is:
cat: d05-[41-42]: No such file or directory
--------------------------------------------------------------------------
No nodes are available for this job, either due to a failure to
allocate nodes to the job, or allocated nodes being marked
as unavailable (e.g., down, rebooting, or a process attempting
to be relocated to another node when none are available).
--------------------------------------------------------------------------
The following command will generate a file containing node hostnames within a Slurm job:
scontrol show hostnames $SLURM_JOB_NODELIST > $TMPDIR/nodefile.txt
For example, nodefile.txt will have one hostname per line:
On Slurm clusters, this line in the palace script generates an error because
NODE_LIST=$SLURM_JOB_NODELIST
lists node names and is not a file (likePBS_NODEFILE
):https://github.com/awslabs/palace/blob/6c180aa8a127f224a04b3fce69ef17b085fb14d6/scripts/palace#L168
For example, with two allocated nodes named d05-41 and d05-42,
SLURM_JOB_NODELIST=d05-[41-42]
and the error is:The following command will generate a file containing node hostnames within a Slurm job:
For example, nodefile.txt will have one hostname per line: