Open schristley opened 6 years ago
As you probably don't have access to TACC, here are the test files, this is the job.sh
#!/bin/bash
#SBATCH -J repcalc_bcr4_test
#SBATCH -o job.out
#SBATCH -e job.err
#SBATCH -t 01:00:00
#SBATCH -p skx-normal
#SBATCH -N 4 -n 48
#SBATCH -A RepServer
module purge
module load TACC
module load launcher
module load python
rm -f joblist
touch joblist
echo "echo 1" >> joblist
echo "echo 2" >> joblist
echo "echo 3" >> joblist
echo "echo 4" >> joblist
echo "echo 5" >> joblist
echo "echo 6" >> joblist
echo "echo 7" >> joblist
echo "echo 8" >> joblist
echo "echo 9" >> joblist
echo "echo 10" >> joblist
echo "echo 11" >> joblist
echo "echo 12" >> joblist
# Launcher to use multicores on node
export LAUNCHER_WORKDIR=$PWD
export LAUNCHER_PPN=4
export LAUNCHER_JOB_FILE=joblist
export LAUNCHER_SCHED=interleaved
$LAUNCHER_DIR/paramrun
Here is the output from running the job:
Launcher: Setup complete.
------------- SUMMARY ---------------
Number of hosts: 4
Working directory: /scratch/01114/vdj/vdj/launcher-test
Processes per host: 4
Total processes: 16
Total jobs: 12
Scheduling method: interleaved
-------------------------------------
Launcher: Starting parallel tasks...
Launcher: Task 0 running job 1 on c479-111.stampede2.tacc.utexas.edu (echo 1)
Launcher: Task 3 running job 4 on c479-111.stampede2.tacc.utexas.edu (echo 4)
1
4
Launcher: Task 1 running job 2 on c479-111.stampede2.tacc.utexas.edu (echo 2)
2
Launcher: Task 2 running job 3 on c479-111.stampede2.tacc.utexas.edu (echo 3)
3
Launcher: Job 2 completed in 0 seconds.
Launcher: Job 1 completed in 0 seconds.
Launcher: Job 3 completed in 0 seconds.
Launcher: Job 4 completed in 0 seconds.
Launcher: Task 1 done. Exiting.
Launcher: Task 0 done. Exiting.
Launcher: Task 3 done. Exiting.
Launcher: Task 2 done. Exiting.
Launcher: Task 5 running job 6 on c479-112.stampede2.tacc.utexas.edu (echo 6)
Launcher: Task 6 running job 7 on c479-112.stampede2.tacc.utexas.edu (echo 7)
Launcher: Task 7 running job 8 on c479-112.stampede2.tacc.utexas.edu (echo 8)
Launcher: Task 4 running job 5 on c479-112.stampede2.tacc.utexas.edu (echo 5)
6
7
8
5
Launcher: Task 10 running job 11 on c490-084.stampede2.tacc.utexas.edu (echo 11)
11
Launcher: Task 8 running job 9 on c490-084.stampede2.tacc.utexas.edu (echo 9)
9
Launcher: Task 13 running job 14 on c490-091.stampede2.tacc.utexas.edu (echo 12)
Launcher: Task 15 running job 16 on c490-091.stampede2.tacc.utexas.edu (echo 12)
12
12
Launcher: Task 14 running job 15 on c490-091.stampede2.tacc.utexas.edu (echo 12)
12
Launcher: Task 11 running job 12 on c490-084.stampede2.tacc.utexas.edu (echo 12)
12
Launcher: Task 9 running job 10 on c490-084.stampede2.tacc.utexas.edu (echo 10)
10
Launcher: Task 12 running job 13 on c490-091.stampede2.tacc.utexas.edu (echo 12)
12
Launcher: Job 5 completed in 0 seconds.
Launcher: Job 7 completed in 0 seconds.
Launcher: Job 8 completed in 0 seconds.
Launcher: Job 11 completed in 0 seconds.
Launcher: Job 6 completed in 0 seconds.
Launcher: Job 9 completed in 0 seconds.
Launcher: Task 7 done. Exiting.
Launcher: Job 14 completed in 0 seconds.
Launcher: Task 6 done. Exiting.
Launcher: Task 4 done. Exiting.
Launcher: Job 12 completed in 0 seconds.
Launcher: Job 16 completed in 0 seconds.
Launcher: Job 10 completed in 0 seconds.
Launcher: Task 10 done. Exiting.
Launcher: Task 5 done. Exiting.
Launcher: Job 15 completed in 0 seconds.
Launcher: Task 8 done. Exiting.
Launcher: Job 13 completed in 0 seconds.
Launcher: Task 13 done. Exiting.
Launcher: Task 15 done. Exiting.
Launcher: Task 11 done. Exiting.
Launcher: Task 9 done. Exiting.
Launcher: Task 14 done. Exiting.
Launcher: Task 12 done. Exiting.
Launcher: Done. Job exited without errors
I guess this is a duplicate of #16
I filed a TACC ticket (Ticket #43327) as I don't know if this is a launcher bug, or just an issue with TACC's current version of launcher.