argonne-lcf / GettingStarted

Collection of small examples for running on ALCF resources
16 stars 7 forks source link

value of NRANKS_PER_NODE differs here vs. online docs #2

Closed zingale closed 2 years ago

zingale commented 2 years ago

In the GPU example: https://github.com/argonne-lcf/GettingStarted/blob/master/Examples/Polaris/affinity_gpu/submit.sh, the script sets:

NRANKS_PER_NODE=8

but in the example in the online docs it is set to 4 (https://www.alcf.anl.gov/support/user-guides/polaris/queueing-and-running-jobs/example-job-scripts/index.html)

Isn't 4 the correct number, since there are 4 GPUs per node?

saforem2 commented 2 years ago

I believe you're right.

An easy fix would be to replace hard-coded values with something like:

NGPU_PER_RANK=$(nvidia-smi -L | wc -l)

in:

  1. Examples/Polaris/affinity_gpu/set_affinity_gpu_polaris.sh
  2. Examples/Polaris/affinity_gpu/submit.sh

A (maybe too-specific) alternative would be to set these dynamically:

#!/bin/bash

if [[ $(hostname) == theta* ]]; then
    echo "┏━━━━━━━━━━━━━━━━━━━━━┓"
    echo "┃ Running on ThetaGPU ┃"
    echo "┗━━━━━━━━━━━━━━━━━━━━━┛"
    NRANKS=$(wc -l < ${COBALT_NODEFILE})
    HOSTFILE=${COBALT_NODEFILE}
    NGPU_PER_RANK=$(nvidia-smi -L | wc -l)
    NGPUS=$((${NRANKS}*${NGPU_PER_RANK}))
    MPI_COMMAND=$(which mpirun)
    MPI_FLAGS="-x LD_LIBRARY_PATH -x PATH -n ${NGPUS} -npernode ${NGPU_PER_RANK} --hostfile ${HOSTFILE}"
elif [[ $(hostname) == x* ]]; then
    echo "┏━━━━━━━━━━━━━━━━━━━━┓"
    echo "┃ Running on Polaris ┃"
    echo "┗━━━━━━━━━━━━━━━━━━━━┛"
    NRANKS=$(wc -l < ${PBS_NODEFILE})
    HOSTFILE=${PBS_NODEFILE
    NGPU_PER_RANK=$(nvidia-smi -L | wc -l)
    NGPUS=$((${NRANKS}*${NGPU_PER_RANK}))
    MPI_COMMAND=$(which mpiexec)
    MPI_FLAGS="-n ${NGPUS} --ppn ${NGPU_PER_RANK} --envall --hostfile ${HOSTFILE}"
 else
    echo "HOSTNAME: $(hostname)"
 fi

${MPI_COMMAND} ${MPI_FLAGS} $@
cjknight commented 2 years ago

Thanks for calling this out!

Correct, there are 4 GPUs on Polaris nodes. The 8 was chosen here simply as illustrative example of what the little helper script was doing. If application isn't using MPS or MIG mode, then, yes, they're likely to only use a single MPI rank per GPU and a statement to that effect should be added to avoid confusion here. We haven't documented MPS yet in the docs, so we're not showing any "over-subscription" examples there yet.

I'll see if I can get MPS working in examples and then update this submit.sh to just use 4 ranks and then create a new submit_mps.sh example script binding multiple ranks per gpu (and similarly for MIG).

cjknight commented 2 years ago

@saforem2 i went ahead and simplified things for 4 ranks. The MPS script submit_mps.sh with 8 MPI ranks per node look like they're working as expected. I can fix if anything else catches your eye.