Open heatherkellyucl opened 1 year ago
Related issues (because they depend on OpenMPI 4.1.x):
As part of my HOOMD-blue investigating, I have ended up with a openmpi-4.1.4-gcc-4.9.2-j5watuf
module in my Spack install on Young.
spack find -ldf
# the openmpi part:
j5watuf openmpi@4.1.4%gcc
d2qmyox hwloc@2.8.0%gcc
5lmv3ne libpciaccess@0.16%gcc
6ztmz4g util-macros@1.19.3%gcc
rg74oss libxml2@2.10.3%gcc
tvlwknx libiconv@1.16%gcc
zqtvwgj xz@5.2.7%gcc
bwqnwsu ncurses@6.3%gcc
vybwgro numactl@2.0.14%gcc
mhvtfkz autoconf@2.69%gcc
7tai33o automake@1.16.5%gcc
s46real libtool@2.4.7%gcc
pc7f4sq m4@1.4.19%gcc
vevpgoz diffutils@3.8%gcc
oid6oac libsigsegv@2.13%gcc
tnuhpj4 openssh@9.1p1%gcc
q3nmy4f krb5@1.20.1%gcc
3jole6e bison@3.8.2%gcc
rq32ugc gettext@0.21.1%gcc
n5p4sxj tar@1.34%gcc
3zlhv77 pigz@2.7%gcc
sifpfu7 zstd@1.5.2%gcc
7gtuxrg libedit@3.1-20210216%gcc
pege64j libxcrypt@4.4.31%gcc
iaefdyl openssl@1.1.1s%gcc
bulswgh ca-certificates-mozilla@2022-10-11%gcc
lksmiyk perl@5.36.0%gcc
txaxkab berkeley-db@18.1.40%gcc
i7forfu bzip2@1.0.8%gcc
ikjdrtq gdbm@1.23%gcc
g7ybkny readline@8.1.2%gcc
bybst4r pkgconf@1.8.0%gcc
2aqjdr4 pmix@4.1.2%gcc
6yztqjc libevent@2.1.12%gcc
bwxsq6s zlib@1.2.13%gcc
I am testing a 2-node job on Young with c_mpi_pi from the pi_examples repo.
#!/bin/bash -l
#$ -l h_rt=0:10:0
#$ -l mem=1G
#$ -pe mpi 80
#$ -N pi_80_ompi-4.1.4
#$ -cwd
#$ -P Test
#$ -A Test_allocation
module unload -f compilers mpi
module load compilers/gnu/4.9.2
module use /home/cceahke/Scratch/spack/spack/share/spack/modules/linux-rhel7-broadwell
module load openmpi-4.1.4-gcc-4.9.2-j5watuf
ompi_info
gerun ./mpi_pi
(Probably do not need the compiler module loaded there at all).
I forgot to export GERUN_LAUNCHER=openmpi-sge
so gerun decided I had no MPI implementation. (Also, might need to set =openmpi if it doesn't have SGE integration).
Ok, it was built with --without-sge
by default so it only ran on one node.
This time I set export GERUN_LAUNCHER=openmpi
and it worked!
GERun: GErun command being run:
GERun: mpirun -machinefile /tmpdir/job/820066.undefined/machines -np 80 ./mpi_pi
Calculating PI using 80 processes...
Proc 18 says hello, is going to calculate slice 225000000-237499999
Proc 9 says hello, is going to calculate slice 112500000-124999999
Proc 34 says hello, is going to calculate slice 425000000-437499999
Proc 2 says hello, is going to calculate slice 25000000-37499999
Proc 57 says hello, is going to calculate slice 712500000-724999999
Proc 25 says hello, is going to calculate slice 312500000-324999999
Proc 58 says hello, is going to calculate slice 725000000-737499999
Proc 11 says hello, is going to calculate slice 137500000-149999999
Proc 0 says hello, is going to calculate slice 0-12499999
...
Proc 23 says hello, is going to calculate slice 287500000-299999999
Proc 39 says hello, is going to calculate slice 487500000-499999999
Proc 55 says hello, is going to calculate slice 687500000-699999999
The value of PI is 3.14159240526447
The time to calculate PI was 0.0770059 seconds
I suppose one question is whether we should include SGE integration if we are replacing the scheduler. Gerun will look the same either way as long as we set GERUN_LAUNCHER appropriately, the difference is when using mpirun people will need to specify $TMPDIR/machines
as their machinefile, whereas with SGE integration they don't need to include one.
Or whether we rebuild things at that time with $OtherScheduler integration. (Or can we add both and all will be well?)
To build OpenMPI with my GCC 12.x build after adding the compiler to Spack I'm running:
spack install openmpi %gcc@12.2.0 2>&1 | tee OpenMPI-build.log
I've now got the following installed on Young:
-- linux-rhel7-cascadelake / gcc@12.2.0 -------------------------
autoconf@2.69 libevent@2.1.12 openssl@1.1.1s
automake@1.16.5 libiconv@1.16 perl@5.36.0
berkeley-db@18.1.40 libpciaccess@0.16 pigz@2.7
bison@3.8.2 libsigsegv@2.13 pkgconf@1.8.0
bzip2@1.0.8 libtool@2.4.7 pmix@4.1.2
ca-certificates-mozilla@2022-10-11 libxcrypt@4.4.33 readline@8.1.2
diffutils@3.8 libxml2@2.10.3 tar@1.34
gdbm@1.23 m4@1.4.19 util-macros@1.19.3
gettext@0.21.1 ncurses@6.3 xz@5.2.7
hwloc@2.8.0 numactl@2.0.14 zlib@1.2.13
krb5@1.20.1 openmpi@4.1.4 zstd@1.5.2
libedit@3.1-20210216 openssh@9.1p1
and:
module avail openmpi
- /lustre/scratch/ccaabaa/apps/spack-test/spack/share/spack/modules/linux-rhel7-cascadelake -
openmpi-4.1.4-gcc-12.2.0-irwlhs3
OpenMPI build on Myriad failed in the middle - need to check tomorrow.
Check multi-node on Young/Kathleen/Michael in particular. (We've had one working on Myriad and Thomas only).