E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
349 stars 359 forks source link

many e3sm tests do not produce repeatable results with compy+pgi #3337

Open jgfouca opened 4 years ago

jgfouca commented 4 years ago

The following tests:

Test 'NCK.ne11_oQU240.A_WCYCL1850.compy_pgi' finished with status 'DIFF'
Test 'SMS.ne30_oECv3.BGCEXP_BCRC_CNPECACNT_1850.compy_pgi.clm-bgcexp' finished with status 'DIFF'
Test 'SMS.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850.compy_pgi.clm-bgcexp' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-cosplite' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-preqx_ftype0' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-preqx_ftype1' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-preqx_ftype4' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-thetahy_ftype0' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-thetahy_ftype1' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-thetahy_ftype2' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-thetahy_ftype4' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-thetanh_ftype0' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-thetanh_ftype1' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-thetanh_ftype2' finished with status 'DIFF'
Test 'SMS.ne4_ne4.FC5AV1C-L.compy_pgi.cam-thetanh_ftype4' finished with status 'DIFF'
Test 'SMS_Ln5.ne4pg2_ne4pg2.FC5AV1C-L.compy_pgi' finished with status 'DIFF'
Test 'SMS_Ln5.ne4pg2_ne4pg2.FC5AV1C-L.compy_pgi.cam-thetahy_pg2' finished with status 'DIFF'
Test 'SMS_Ln9.ne4_ne4.FC5AV1C-L.compy_pgi.cam-outfrq9s' finished with status 'DIFF'

Do not produce the same results from run-to-run, even with the same commit. This is with PGI 18.10 using IntelMPI 2019u3

Note, switching to the intel compiler somehow fixes this issue.

jgfouca commented 4 years ago

@rljacob please assign accordingly

worleyph commented 4 years ago

I'll take a look if you want. Anything special to do to get "PGI 18.10 using IntelMPI 2019u3"? Also, what versions of PGI are we using on other system? (older? newer?)

rljacob commented 4 years ago

Summit is using 19.4 and 19.7. So I guess we should just upgrade to those first.

--compiler pgi on compy is enough to get that config.

worleyph commented 4 years ago

I verified the nondeterminancy (two consecutive identical runs generating different results) using

 --res ne4_ne4 --compset FC5AV1C-L 

and the default 96 MPI processes (no OpenMP) on three nodes, 40 processes per node. Differences showed up quickly (before completion of nstep 1). I also see that there is no Depends file for PGI for Compy. as distinct from Intel on Compy (lots going on) and from PGI on Summit. So, E3SM using PGI has not really been "ported" to Compy yet.

I can start looking at what might go into a Depends file, based on the other Depends files, but this may be sensitive to the version of the compiler, so prefer to wait until we have installed the version that we are really interested in.

oksanaguba commented 4 years ago

What are Depends files for?

worleyph commented 4 years ago

file-specific compiler flags, often downgrading optimization level for problem files, but also increasing optimzation for performance sensitive files that can handle the higher levels of optimizations.

worleyph commented 4 years ago

Determined that the above mentioned F case was (through 5 days, looking at the atm.log)

 a) deterministic and reproducible (with respect to process count) for 2, 4, and 5 MPI processes

 b) deterministic but NOT reproducible for 8, 10, and 20 MPI processes

 c) nondeterministic for 40 MPI processes

using the current compiler flag settings.

worleyph commented 4 years ago

Replacing "-O2" with "-O1 -M novect" (first thing I thought to try) eliminated the nondeterminism and the nonreproducibility in this one F case (tested with 4, 40, and 96 MPI processes, without threading). So, a Depends file approach (and/or change of compiler version) will allow us to address this issue on Compy. I'll continue poking to see whether the -O1 or the -M novect are sufficient by themselves. I'll wait on bisection of the files (to see which need the lower level of compiler optimization) until we update to the newer version of the PGI compiler.

worleyph commented 4 years ago

-Mnovect was sufficient to restore determinism and reproducility (in this one example), that is, adding -Mnovect to -O2 .

singhbalwinder commented 4 years ago

Thanks Pat for working on it. pgi/19.7 and pgi/19.10 with complete software stack are available on Compy if you would like to try. Let me know if you would like me to change the machine files for these compilers.

worleyph commented 4 years ago

Thanks @singhbalwinder . I would like to try them, and I would like your help in doing so. What do I need? Just the modified machine files (which you can provide)?

I don't know who will decide which we will be using going forward, but I'll try to gather some information about these options.

singhbalwinder commented 4 years ago

I tried pgi/19.7 but it seems like it is missing pnetcdf compiled with IntelMPI. I have asked support to build that library.

naromero77 commented 4 years ago

My understanding is that determinacy (run the same calculation over and over again and get the bitwise reproducible results) for both MPI and OpenMP are not actually part of the spec AFAIK. They are implementation dependent. I will say that on the BlueGene series of computers, MPI and OpenMP were implemented in deterministic manner according to IBM. There are some universal exceptions, for example, if you use MPI_ANY_TAG with Send/Recv (this happens in ScaLAPACK). Another example is Intel MKL where the execution path can depend on the state of the CPU -- it is not even deterministic in serial (no OpenMP, no MPI).

worleyph commented 4 years ago

Note that this is standard practice for E3SM and its predecessors, and the coding guidelines (and check-in requirements) have allowed us to preserve this for the past 30 years. In this case, it is easily restored with a change to the compiler flags (and is not an MPI or OpenMP issue). We will give it up when there is a significant benefit and when we have reliable methods to determine correctness that does not require deterministic behavior. We realize giving up reproducibility with respect to process and thread count will have to happen some day, but preserving for DEBUG will always be a goal even then. Giving up deterministic behavior is not a topic that we have considered much yet, despite hearing warnings that even this is likely to disappear some day.

worleyph commented 4 years ago

@singhbalwinder , any progress on getting pgi/19.4 or pgi/19.7 installed and working with E3SM? Thanks.

singhbalwinder commented 4 years ago

The compiler is installed but the compiler wrappers for MPI are different than the previous compiler. I have asked the support to make them the same for every PGI compiler installation so that we just have to modify at one place if the compiler version changes in the future. I will ping them once more to see if it is ready or not.

singhbalwinder commented 4 years ago

@worleyph : Please use the branch of PR #3382 which updates PGI compiler on Compy to 19.7. It works but the code is blowing up due to array bound error for some reason in atm codes. I will look into it.

0: Subscript out of range for array zvirv (/compyfs/sing201/delete/E3SM/components/cam/src/physics/cam/physics_types.F90: 423)
42:     subscript=1, lower bound=140726920586784, upper bound=140726920586811, dimension=1
worleyph commented 4 years ago

@singhbalwinder , any progress on debugging the runtime error you discovered when trying to use pig/19.7? Would it be easy to try pgi/19.10 by my modifying the PR #3382 branch?

singhbalwinder commented 4 years ago

I didn't get time to work on it. I will try to reproduce it and work with support to fix these errors.

amametjanov commented 4 years ago

pgi/19.10 has fewer errors (no more ATM errors mentioned by Balwinder), but the build issue with clm/src/external_models/emi/src/emi/ExternalModelInterfaceMod.F90 and reproducibility issue (both with different number of threads or different number of tasks in pure-MPI mode) are still there.

worleyph commented 4 years ago

Just a reminder - the reproducibility (and nondeterminism) will not be addressed until we construct a Depends file that lowers optimization on a subset of files, like we already do for Intel (and PGI on other systems in the past).