ACCESS-NRI / accessdev-Trac-archive

Archive accessdev Trac contents as issues
Apache License 2.0
0 stars 0 forks source link

Test coupled model with Intel MPI #354

Open penguian opened 6 years ago

penguian commented 6 years ago

| by mrd599@nci.org.au


Test whether this resolves the "too many retries" problems at start up.


Issue migrated from trac:354 at 2024-01-31 18:33:18 +1100

penguian commented 6 years ago

@martin.dix@anu.edu.au changed status from new to accepted

penguian commented 6 years ago

@martin.dix@anu.edu.au set owner to mrd599

penguian commented 6 years ago

@martin.dix@anu.edu.au commented


Built oasis3-mct using

module load intel-fc/15.0.1.133
module load intel-cc/15.0.1.133
module load intel-mkl/15.0.1.133
module load intel-mpi/5.1.3.210
module load netcdf/4.3.2

Created module oasis3-mct-local/intelmpi.5.1.3.210

oasis3_tutorial test works.

Intel mpirun doesn't support the -wd argument but uses -wdir instead. OpenMPI also supports this.

The CICE build script uses mpifort which isn't set up in the intel-mpi environment. Use mpif90 instead.

Intel mpirun doesn't support use of rankfiles or hostfiles for each executable. Use a single hostfile for the whole job (which is probably redundant).

Add access-coupled-intelmpi and access-atmos-intelmpi scripts.

Test suite u-au795 is a copy of u-aq959.

At runtime need to load modules that normal suite only requires at build time

            module load intel-fc/17.0.1.132
            module load libpng
            module load openjpeg
            module load zlib

Don't understand this at the moment.

penguian commented 6 years ago

@martin.dix@anu.edu.au changed _comment0 which not transferred by tractive

penguian commented 6 years ago

@martin.dix@anu.edu.au changed _comment1 which not transferred by tractive

penguian commented 6 years ago

@martin.dix@anu.edu.au commented


Three month run on Broadwell, UM 28x20, MOM 8x14, CICE 28 cores. UM timings reported to avoid any PBS delays.

Intel MPI (u-au795) Maximum Elapsed Wallclock Time: 5763.74 Rerun Maximum Elapsed Wallclock Time: 6650.37

OpenMPI 1.10.2 (u-aq795) Maximum Elapsed Wallclock Time: 5074.78

Results were identical.

penguian commented 6 years ago

@martin.dix@anu.edu.au changed _comment0 which not transferred by tractive

penguian commented 6 years ago

@martin.dix@anu.edu.au changed component from ACCESS model to ACCESS-CM2