Closed leifdenby closed 3 years ago
Debug commands I'm using on ARCHER2 (for my own reference):
Run MONC inside gdb4hpc
:
$> gdb4hpc
gdb all> launch --args="--config=tests/straka_short.mcf --checkpoint_file=checkpoint_files/straka_dump.nc" --launcher-args="--partition=standard --qos=standard --tasks-per-node=2 --exclusive --export=all" $monc{2} ./build/bin/monc_driver.exe
Currently I'm stuck with an issue with a call to MPI_Alltoallv
earlcd@uan01:/work/ta009/ta009/earlcd/git-repos/monc> fcm make -f fcm-make/monc-cray-cray.cfg
[init] make # 2020-12-15T15:15:52Z
[info] FCM 2019.05.0 (/home2/home/ta009/ta009/earlcd/fcm-2019.09.0)
[init] make config-parse # 2020-12-15T15:15:52Z
[info] config-file=/lus/cls01095/work/ta009/ta009/earlcd/git-repos/monc/fcm-make/monc-cray-cray.cfg
[info] config-file= - /lus/cls01095/work/ta009/ta009/earlcd/git-repos/monc/fcm-make/comp-cray-2107.cfg
[info] config-file= - /lus/cls01095/work/ta009/ta009/earlcd/git-repos/monc/fcm-make/env-cray.cfg
[info] config-file= - /lus/cls01095/work/ta009/ta009/earlcd/git-repos/monc/fcm-make/monc-build.cfg
[done] make config-parse # 0.0s
[init] make dest-init # 2020-12-15T15:15:52Z
[info] dest=earlcd@uan01:/lus/cls01095/work/ta009/ta009/earlcd/git-repos/monc
[info] mode=incremental
[done] make dest-init # 0.0s
[init] make extract # 2020-12-15T15:15:52Z
[info] location monc: 0: /lus/cls01095/work/ta009/ta009/earlcd/git-repos/monc
[info] dest: 381 [U unchanged]
[info] source: 381 [U from base]
[done] make extract # 0.4s
[init] make preprocess # 2020-12-15T15:15:53Z
[info] sources: total=381, analysed=0, elapsed-time=0.2s, total-time=0.0s
[info] target-tree-analysis: elapsed-time=0.0s
[info] install targets: modified=0, unchanged=8, failed=0, total-time=0.0s
[info] process targets: modified=0, unchanged=172, failed=0, total-time=0.0s
[info] TOTAL targets: modified=0, unchanged=180, failed=0, elapsed-time=0.2s
[done] make preprocess # 0.8s
[init] make build # 2020-12-15T15:15:54Z
[info] sources: total=381, analysed=0, elapsed-time=0.1s, total-time=0.0s
[info] target-tree-analysis: elapsed-time=0.1s
[info] compile targets: modified=120, unchanged=3, failed=0, total-time=176.7s
[info] compile+ targets: modified=112, unchanged=7, failed=0, total-time=0.5s
[info] link targets: modified=1, unchanged=0, failed=0, total-time=0.5s
[info] TOTAL targets: modified=233, unchanged=10, failed=0, elapsed-time=178.1s
[done] make build # 178.3s
[done] make # 179.6s
earlcd@uan01:/work/ta009/ta009/earlcd/git-repos/monc>
(reverse-i-search)`': ^C
earlcd@uan01:/work/ta009/ta009/earlcd/git-repos/monc> sbatch utils/archer2/submonc.slurm
Submitted batch job 59769
earlcd@uan01:/work/ta009/ta009/earlcd/git-repos/monc> cat slurm-59769.out
Unloading /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env-profile
Loading cpe-cray
Loading cce/10.0.4
Loading craype/2.7.2
Loading craype-x86-rome
Loading libfabric/1.11.0.0.233
Loading craype-network-ofi
Loading cray-dsmml/0.1.2
Loading perftools-base/20.10.0
Loading xpmem/2.2.35-7.0.1.0_1.9__gd50fabf.shasta
Loading cray-mpich/8.0.16
Loading cray-libsci/20.10.1.2
Loading /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env
Currently Loaded Modulefiles:
1) cpe-cray
2) cce/10.0.4(default)
3) craype/2.7.2(default)
4) craype-x86-rome
5) libfabric/1.11.0.0.233(default)
6) craype-network-ofi
7) cray-dsmml/0.1.2(default)
8) perftools-base/20.10.0(default)
9) xpmem/2.2.35-7.0.1.0_1.9__gd50fabf.shasta(default)
10) cray-mpich/8.0.16(default)
11) cray-libsci/20.10.1.2(default)
12) /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env
13) epcc-job-env
14) cray-netcdf/4.7.4.2(default)
15) cray-fftw/3.3.8.8(default)
16) cray-hdf5/1.12.0.2(default)
MPICH ERROR [Rank 1] [job id 59769.0] [Tue Dec 15 15:22:06 2020] [unknown] [nid001139] - Abort(403275522) (rank 1 in comm 0): Fatal error in PMPI_Alltoallv: Invalid count, error stack:
PMPI_Alltoallv(410): MPI_Alltoallv(sbuf=0x9183600, scnts=0x45becc0, sdispls=0x47f7a00, MPI_DOUBLE_PRECISION, rbuf=0x92888c0, rcnts=0x47f6540, rdispls=0x47f4040, datatype=MPI_DOUBLE_PRECISION, comm=MPI_COMM_SELF) failed
PMPI_Alltoallv(351): Negative count, value is -1207723264
aborting job:
Fatal error in PMPI_Alltoallv: Invalid count, error stack:
PMPI_Alltoallv(410): MPI_Alltoallv(sbuf=0x9183600, scnts=0x45becc0, sdispls=0x47f7a00, MPI_DOUBLE_PRECISION, rbuf=0x92888c0, rcnts=0x47f6540, rdispls=0x47f4040, datatype=MPI_DOUBLE_PRECISION, comm=MPI_COMM_SELF) failed
PMPI_Alltoallv(351): Negative count, value is -1207723264
[INFO] MONC running with 1 processes, 1 IO server(s)
[WARN] No enabled configuration for component ideal_squall therefore disabling this
[WARN] No enabled configuration for component kid_testcase therefore disabling this
[WARN] Run order callback for component tank_experiments at stage initialisation not specified
[WARN] Run order callback for component tank_experiments at stage finalisation not specified
[WARN] Defaulting to one dimension decomposition due to solution size too small
[INFO] Decomposed 1 processes via 'OneDim' into z=1 y=1 x=1
[INFO] 3D system; z=65, y=512, x=2
srun: error: nid001139: task 1: Exited with exit code 255
srun: Terminating job step 59769.0
slurmstepd: error: *** STEP 59769.0 ON nid001139 CANCELLED AT 2020-12-15T15:22:06 ***
srun: error: nid001139: task 0: Terminated
srun: Force Terminated job step 59769.0
I've tried compiling with fcm-make/comp-cray-2107-debug.cfg
and using gdb4hpc
to identify the issue. Within gdb4hpc
I'm stuck since I don't get any output when trying to print local variables:
dbg all> launch --args="--config=tests/straka_short.mcf --checkpoint_file=checkpoint_files/straka_dump.nc" --launcher-args="--partition=standard --qos=standard --tasks-per-node=2 --exclusive --export=all" $monc{2} ./build/bin/monc_driver.exe
Starting application, please wait...
Creating MRNet communication network...
Waiting for debug servers to attach to MRNet communications network...
Timeout in 400 seconds. Please wait for the attach to complete.
Number of dbgsrvs connected: [1]; Timeout Counter: [0]
Number of dbgsrvs connected: [1]; Timeout Counter: [1]
Number of dbgsrvs connected: [2]; Timeout Counter: [0]
Finalizing setup...
Launch complete.
monc{0..1}: Initial breakpoint, monc_driver at /lus/cls01095/work/ta009/ta009/earlcd/git-repos/monc/preprocess/src/monc/monc_driver.F90:16
dbg all> break pencilfft.F90:360
...
bg all> print source_data
monc{0}: *** The application is running
dbg all> print size(source_data)
Hi @leifdenby: since these are MPI issues, I thought the changes that Chris applied for ARC4 may be worth exploring, in case you have not done so yet.
Hi @leifdenby: since these are MPI issues, I thought the changes that Chris applied for ARC4 may be worth exploring, in case you have not done so yet.
Great idea @sjboeing! I'll give this a try
Unfortunately the fixes introduced for ARC4 don't appear to have fixed the issue @sjboeing. But I have an idea what the issue might be. I'll put my testing in separate comments below
(optimised) compiling with cray fortran compiler and running
This run-time error suggests to me that the routine calculating the size of the buffer used to make the MPI communication is doing the calculation incorrectly.
I should also note that compiling with debug flags (using fcm-make/monc-cray-cray-debug.cfg
, this compiles) the SLURM job simply aborts (not clear from the log why).
compiling with GNU fortran compiler and running
does not compile
With the GNU compiler MONC fails to compile. All the errors are related to incorrect datatypes being passed to MPI-related subroutines (as far as I can see). I think these are bugs and fixing these may resolve the issue we are having at runtime. It is possible that the GNU compiler is being more strict here and catching these bugs at compile-time.
Thoughts @sjboeing ?
Digging a little further it I've added some print statements. It appears that the counts are calculated incorrectly (I'm compiling with Cray Fortran again here)
earlcd@uan01:~/work/monc> cat slurm-77539.out
Unloading /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env-profile
Loading cpe-cray
Loading cce/10.0.4
Loading craype/2.7.2
Loading craype-x86-rome
Loading libfabric/1.11.0.0.233
Loading craype-network-ofi
Loading cray-dsmml/0.1.2
Loading perftools-base/20.10.0
Loading xpmem/2.2.35-7.0.1.0_1.9__gd50fabf.shasta
Loading cray-mpich/8.0.16
Loading cray-libsci/20.10.1.2
Loading bolt/0.7
Loading /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env
Loading epcc-job-env
Loading requirement: bolt/0.7
Currently Loaded Modulefiles:
1) cpe-cray
2) cce/10.0.4(default)
3) craype/2.7.2(default)
4) craype-x86-rome
5) libfabric/1.11.0.0.233(default)
6) craype-network-ofi
7) cray-dsmml/0.1.2(default)
8) perftools-base/20.10.0(default)
9) xpmem/2.2.35-7.0.1.0_1.9__gd50fabf.shasta(default)
10) cray-mpich/8.0.16(default)
11) cray-libsci/20.10.1.2(default)
12) bolt/0.7
13) /work/y07/shared/archer2-modules/modulefiles-cse/epcc-setup-env
14) epcc-job-env
MPICH ERROR [Rank 2] [job id 77539.0] [Tue Jan 26 13:07:48 2021] [unknown] [nid001199] - Abort(1007255298) (rank 2 in comm 0): Fatal error in PMPI_Alltoallv: Invalid count, error stack:
PMPI_Alltoallv(410): MPI_Alltoallv(sbuf=0x5ce63c0, scnts=0x45fba80, sdispls=0x45fb180, MPI_DOUBLE_PRECISION, rbuf=0x5d26a00, rcnts=0x45fa400, rdispls=0x46279c0, datatype=MPI_DOUBLE_PRECISION, comm=MPI_COMM_SELF) failed
PMPI_Alltoallv(351): Negative count, value is -919842048
aborting job:
Fatal error in PMPI_Alltoallv: Invalid count, error stack:
PMPI_Alltoallv(410): MPI_Alltoallv(sbuf=0x5ce63c0, scnts=0x45fba80, sdispls=0x45fb180, MPI_DOUBLE_PRECISION, rbuf=0x5d26a00, rcnts=0x45fa400, rdispls=0x46279c0, datatype=MPI_DOUBLE_PRECISION, comm=MPI_COMM_SELF) failed
PMPI_Alltoallv(351): Negative count, value is -919842048
debug send_sizes 4352, 3*4096
debug recv_sizes 4*4096
debug send_sizes 16448
debug recv_sizes 16448
debug send_sizes 32896
debug recv_sizes -919842048
MPICH ERROR [Rank 1] [job id 77539.0] [Tue Jan 26 13:07:48 2021] [unknown] [nid001199] - Abort(1007255298) (rank 1 in comm 0): Fatal error in PMPI_Alltoallv: Invalid count, error stack:
PMPI_Alltoallv(410): MPI_Alltoallv(sbuf=0x5ca16c0, scnts=0x47f8200, sdispls=0x47f6d40, MPI_DOUBLE_PRECISION, rbuf=0x5ce5cc0, rcnts=0x47f1400, rdispls=0x47ee200, datatype=MPI_DOUBLE_PRECISION, comm=MPI_COMM_SELF) failed
PMPI_Alltoallv(351): Negative count, value is -1300475136
I've gotten as far as identifying that determine_offsets_from_size (also in compiled MONC docs) is in charge of computing these offsets. I think the next step will be to work out why this subroutine isn't calculating positive values (as it should).
Hi @Leif, this looks like it is not really trivial. two things you may try are: 1) disable the ioserver component, just to see if the issue has to do with it. (just use enable_io_server=.false.?) 2) Run a BOMEX case. I think the straka case is effectively 2D, and may not be as well accommodated as the runs on a 3D domain.
Thanks for the suggestions @sjboeing, I've tried both and the issue persists.
and
Hi @leifdenby Leif, just wondering what choice of moncs _per_io is set. Is this a version that still uses FFTW rather than the MOSRS new HEAD that uses FFTE? The message seemed to indicate fft_pencil in earlier output. Note Archer2 has 128 cores per node and 8 NUMA regions per node so it is best to have 1 io per numa region: that is 15 moncs per io server. [sorry if you already knew that I did not look at your case]. alternatively do 63 monc per io so that one IO server per socket.
I just re-read some of your compile woes. NOTE GCC 10. does not like the fact that MPI calls can use any data type so we have to apply " -fallow-argument-mismatch "; I found that back in November and archer team added it to the documentation on building: https://docs.archer2.ac.uk/user-guide/dev-environment/
Did you solve this yet? I see the only ref to mpi_alltoallv appears to be in components/fftsolver/src/pencilfft.F90 I was (am) working with the MOSRS code at r8166 where Adrian seems to make most of his branches start. Perhaps I should turn my attention to this repo.
Did you solve this yet? I see the only ref to mpi_alltoallv appears to be in components/fftsolver/src/pencilfft.F90 I was (am) working with the MOSRS code at r8166 where Adrian seems to make most of his branches start. Perhaps I should turn my attention to this repo.
I haven't no :cry: The farthest I've gotten is producing a branch (see https://github.com/Leeds-MONC/monc/pull/38) which contains all the commits that Adrian has made on MOSRS where he is working on ARCHER2 fixes. As you know this branch includes a lot of changes and as it stands also reverses changes Chris recently did for ARC4.
I am going to try and cherry-pick just the first four commits and see if that helps with running on ARCHER2.
I just re-read some of your compile woes. NOTE GCC 10. does not like the fact that MPI calls can use any data type so we have to apply " -fallow-argument-mismatch "; I found that back in November and archer team added it to the documentation on building: https://docs.archer2.ac.uk/user-guide/dev-environment/
Thank you for suggesting this. I'll give it a try with adding that compilation flag.
just wondering what choice of moncs _per_io is set. Is this a version that still uses FFTW rather than the MOSRS new HEAD that uses FFTE? The message seemed to indicate fft_pencil in earlier output. Note Archer2 has 128 cores per node and 8 NUMA regions per node so it is best to have 1 io per numa region: that is 15 moncs per io server. [sorry if you already knew that I did not look at your case]. alternatively do 63 monc per io so that one IO server per socket.
Thank you for suggesting this. I'm not quiet sure how to work this out. If you check my run log above you'll see:
[INFO] MONC running with 4 processes, 1 IO server(s)
My moncs_per_io=3
(https://github.com/leifdenby/monc/blob/archer2-compilation/tests/straka_short.mcf#L38) and I think I'm requesting 5 cores in my job (https://github.com/leifdenby/monc/blob/archer2-compilation/utils/archer2/submonc.slurm#L7)
Does that sound reasonable or am I doing something obviously stupid?
The job looks poorly specified. If you want to ru a total of 4 MPI tasks (i.e. 1 io and 3 monc) then tasks-per-node should be also be 4. but then yo might choose to spread them out unless this is just a really basic job and you are happy for all tasks to sit in same numa region. When doing a proper job consider the cpus-per-task if you have fewer than 128 tasks on one node.
Per Ralph's notes, and in agreement with @MarkUoLeeds , it seems that the Leeds branch works with gnu 9.3.0 but not the default gnu 10. Having my module load order as
export PATH=$PATH:/work/y07/shared/umshared/bin
export PATH=$PATH:/work/y07/shared/umshared/software/bin
. mosrs-setup-gpg-agent
module restore PrgEnv-cray
module load cpe-gnu
module load gcc/9.3.0
module load cray-netcdf-hdf5parallel
module load cray-hdf5-parallel
module load cray-fftw/3.3.8.7
module load petsc/3.13.3
seems the best option for compiling with gnu using fcm make -j4 -f fcm-make/monc-cray-gnu.cfg
. The PATH inclusions add in the installed version of fcm and allow you to cache your MOSRS password with the . mosrs-setup-gpg-agent
command (as needed when getting casim and socrates from MOSRS)
Still trying out the cray compiler so can't comment on that, and waiting on the test job to run, but thought I'd mention this
thanks for the help here. I think we can close this now that @cemac-ccs is working on a pull-request for ARCHER2: https://github.com/Leeds-MONC/monc/pull/45
This is work-in-progress to get MONC compiling and running on ARCHER2