Open hamiddashti opened 5 years ago
I've only run pestpp on a slurm cluster a few times, but it scaled well (up to 1000 workers). @mwtoews provided me with the slurm scripts - maybe he has some insights about your scripts?
I haven't seen any scaling issues with slurm, but perhaps we're using different paradigms.
My pest[pp*|_hp] runs with Slurm consist of one master job, and one or more worker jobs. The master normally requests more RAM for inversions, and the workers only request the amount of RAM needed to run the simulations, which is often different (usually smaller). The workers are submitted as a multiple program configuration with srun --multi-prog multi.conf
.
Here are some partial bits of the four files used to orchestrate Slurm runs.
master.sl
#!/bin/bash
#SBATCH --job-name=master
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=4G
#...
# Kick off master
srun master.sh /path/to/master pestpp-ies file.pst
master.sh
#!/bin/bash
set -e
pstbin=$1
pstdir=$2
pstfile=$3
pstflg=$4
cd "$pstdir"
# Get available port from host, write master.txt for workers
masterport=`python - <<EOF
import socket
s = socket.socket()
s.bind(('', 0))
print(s.getsockname()[1])
s.close()
EOF`
echo `hostname -s`:$masterport > master.txt
echo "Starting master: $pstbin $pstfile $pstflg /H :$masterport"
$pstbin $pstfile $pstflg /H :$masterport
workers.sh
#!/bin/bash
#SBATCH --job-name=workers-1
#SBATCH --ntasks=100
#SBATCH --mem-per-cpu=300M
#...
export PST_BIN=pestpp-ies
export MASTER_DIR=/path/to/master
export PST_FILE=file.pst
export PUT_DIR=/path/to/source/files
# Create a file for multi-prog srun
cd /scratch/workers/1 # this worker directory needs to be adjusted for each submission
touch multi.conf
for (( N=0; N<$SLURM_NTASKS; N++)); do
echo "$N workers.sh $PST_BIN $MASTER_DIR $PST_FILE $PUT_DIR $N" >> multi.conf
done
# Kick off workers
srun --multi-prog multi.conf
workers.sl
#!/bin/bash
set -e
pstbin=$1
pstdir=$2
pstfile=$3
putdir=$4
instance=$5
cp -rp $putdir $instance
cd $instance
echo "Worker `hostname -s` running in `pwd`"
# Get master hostname and port
masterpath=$pstdir/master.txt
masterhostport=$(cat "$masterpath")
echo "Starting worker: $pstbin $pstfile /H $masterhostport"
$pstbin $pstfile /H $masterhostport
So I'd usually do a sbatch master.sl
to start the master, then do sbatch workers.sl
one or more times after the master has started with a separate suite of worker directories, and often different ntasks, depending on how busy the HPC is. Hope this helps!
Thank you @jtwhite79 and @mwtoews . We are working on it and hopefully, it will be resolved. I'll keep you posted.
@hamiddashti did you get the pestpp/SLURM setup working well? I'm running into possibly the same issue, where it is running slower as a SLURM job than at an interactive prompt. I will try @mwtoews solution but I thought maybe you'd found something simpler. Thanks-
This issue might not be really related to the pestpp itself and might be more of my bash scripting or our cluster setup. I'm using a cluster with 16 nodes and each node carry 28 cores. I can run the the pestpp-gsa in parallel using worker/slave on one node with the slurm script as below:
#!/bin/bash
#SBATCH -n 1 # total number of tasks requested
#SBATCH --cpus-per-task=1 # cpus to allocate per task
#SBATCH -p shortq # queue (partition) -- defq, eduq, gpuq.
#SBATCH -t 12:00:00 # run time (hh:mm:ss) - 12.0 hours in this.
cd /home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws/master
pestpp-gsa gsa_karun /h :4004 &
MASTER_PID=$!
cd /home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws
parallel -i bash -c "cd {} ; pestpp-gsa gsa_karun /h 127.0.0.1:4004" -- wrk1 wrk2 wrk3 wrk4 wrk5 wrk6
wrk7 wrk8 wrk9 wrk10 wrk11 wrk12 wrk13 wrk14 wrk15 wrk16 wrk17 wrk18 wrk19 wrk20
kill ${MASTER_PID}
The above code which uses 20 cores of one node works fine. I tried to use multiple nodes using more workers using the following script:
#!/bin/bash
#SBATCH -N 4
#SBATCH --tasks-per-node=28
#SBATCH -p defq
#SBATCH -t 120:00:00
ulimit -u 9999
ulimit -s unlimited
ulimit -v unlimited
cd /home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws/master
pestpp-gsa gsa_karun /h :4004 &
MASTER_PID=$!
LEADER=$SLURMD_NODENAME
NODELIST=($(scontrol show hostname $SLURM_JOB_NODELIST))
FOLDERS=(
seq 1 112)
for i in
seq 0 111; do
ssh -f ${NODELIST[$(echo "$i % 4" | bc)]} "cd
/home/hdashti/scratch/ED_BSU/old_ed2/26jan17/ED/working_morris_ws/wrk${FOLDERS[$i]} ; nohup
pestpp-gsa gsa_karun /h ${LEADER}:4004 > worker.log &"
done
wait ${MASTER_PID}
Although I'm using 112 cores now but it takes a lot more to finish the pestpp. I was wondering did anyone else run into the same problem? Or am I missing something here? I'm posting this here beacuse I'm not sure if its pestpp problem or our cluster setup. Thanks