Open jgfouca opened 4 years ago
@rljacob please assign accordingly
I'll take a look if you want. Anything special to do to get "PGI 18.10 using IntelMPI 2019u3"? Also, what versions of PGI are we using on other system? (older? newer?)
Summit is using 19.4 and 19.7. So I guess we should just upgrade to those first.
--compiler pgi
on compy is enough to get that config.
I verified the nondeterminancy (two consecutive identical runs generating different results) using
--res ne4_ne4 --compset FC5AV1C-L
and the default 96 MPI processes (no OpenMP) on three nodes, 40 processes per node. Differences showed up quickly (before completion of nstep 1). I also see that there is no Depends file for PGI for Compy. as distinct from Intel on Compy (lots going on) and from PGI on Summit. So, E3SM using PGI has not really been "ported" to Compy yet.
I can start looking at what might go into a Depends file, based on the other Depends files, but this may be sensitive to the version of the compiler, so prefer to wait until we have installed the version that we are really interested in.
What are Depends files for?
file-specific compiler flags, often downgrading optimization level for problem files, but also increasing optimzation for performance sensitive files that can handle the higher levels of optimizations.
Determined that the above mentioned F case was (through 5 days, looking at the atm.log)
a) deterministic and reproducible (with respect to process count) for 2, 4, and 5 MPI processes
b) deterministic but NOT reproducible for 8, 10, and 20 MPI processes
c) nondeterministic for 40 MPI processes
using the current compiler flag settings.
Replacing "-O2" with "-O1 -M novect" (first thing I thought to try) eliminated the nondeterminism and the nonreproducibility in this one F case (tested with 4, 40, and 96 MPI processes, without threading). So, a Depends file approach (and/or change of compiler version) will allow us to address this issue on Compy. I'll continue poking to see whether the -O1 or the -M novect are sufficient by themselves. I'll wait on bisection of the files (to see which need the lower level of compiler optimization) until we update to the newer version of the PGI compiler.
-Mnovect was sufficient to restore determinism and reproducility (in this one example), that is, adding -Mnovect to -O2 .
Thanks Pat for working on it. pgi/19.7 and pgi/19.10 with complete software stack are available on Compy if you would like to try. Let me know if you would like me to change the machine files for these compilers.
Thanks @singhbalwinder . I would like to try them, and I would like your help in doing so. What do I need? Just the modified machine files (which you can provide)?
I don't know who will decide which we will be using going forward, but I'll try to gather some information about these options.
I tried pgi/19.7 but it seems like it is missing pnetcdf compiled with IntelMPI. I have asked support to build that library.
My understanding is that determinacy (run the same calculation over and over again and get the bitwise reproducible results) for both MPI and OpenMP are not actually part of the spec AFAIK. They are implementation dependent. I will say that on the BlueGene series of computers, MPI and OpenMP were implemented in deterministic manner according to IBM. There are some universal exceptions, for example, if you use MPI_ANY_TAG with Send/Recv (this happens in ScaLAPACK). Another example is Intel MKL where the execution path can depend on the state of the CPU -- it is not even deterministic in serial (no OpenMP, no MPI).
Note that this is standard practice for E3SM and its predecessors, and the coding guidelines (and check-in requirements) have allowed us to preserve this for the past 30 years. In this case, it is easily restored with a change to the compiler flags (and is not an MPI or OpenMP issue). We will give it up when there is a significant benefit and when we have reliable methods to determine correctness that does not require deterministic behavior. We realize giving up reproducibility with respect to process and thread count will have to happen some day, but preserving for DEBUG will always be a goal even then. Giving up deterministic behavior is not a topic that we have considered much yet, despite hearing warnings that even this is likely to disappear some day.
@singhbalwinder , any progress on getting pgi/19.4 or pgi/19.7 installed and working with E3SM? Thanks.
The compiler is installed but the compiler wrappers for MPI are different than the previous compiler. I have asked the support to make them the same for every PGI compiler installation so that we just have to modify at one place if the compiler version changes in the future. I will ping them once more to see if it is ready or not.
@worleyph : Please use the branch of PR #3382 which updates PGI compiler on Compy to 19.7. It works but the code is blowing up due to array bound error for some reason in atm codes. I will look into it.
0: Subscript out of range for array zvirv (/compyfs/sing201/delete/E3SM/components/cam/src/physics/cam/physics_types.F90: 423)
42: subscript=1, lower bound=140726920586784, upper bound=140726920586811, dimension=1
@singhbalwinder , any progress on debugging the runtime error you discovered when trying to use pig/19.7? Would it be easy to try pgi/19.10 by my modifying the PR #3382 branch?
I didn't get time to work on it. I will try to reproduce it and work with support to fix these errors.
pgi/19.10 has fewer errors (no more ATM errors mentioned by Balwinder), but the build issue with clm/src/external_models/emi/src/emi/ExternalModelInterfaceMod.F90
and reproducibility issue (both with different number of threads or different number of tasks in pure-MPI mode) are still there.
Just a reminder - the reproducibility (and nondeterminism) will not be addressed until we construct a Depends file that lowers optimization on a subset of files, like we already do for Intel (and PGI on other systems in the past).
The following tests:
Do not produce the same results from run-to-run, even with the same commit. This is with PGI 18.10 using IntelMPI 2019u3
Note, switching to the intel compiler somehow fixes this issue.