GEOS-ESM / GEOSgcm_GridComp

Repository containing the physics and IAU code for the GEOS Earth System Model
Apache License 2.0
9 stars 7 forks source link

MKIAU_JCAP broken #311

Open wmputman opened 4 years ago

wmputman commented 4 years ago

When we do replays I am no longer able to use the MKIAU_JCAP option.

GEOSmkiau_GridComp/GEOS_mkiauGridComp.F90

Integer4 Resource Parameter: MKIAU_JCAP:60 Current nymd: 20200120 nhms: 0 FAC: 1.00000 Creating GRIDana... Integer4 Resource Parameter: BKG2ANACNSRV:0 Integer*4 Resource Parameter: ANA2BKGCNSRV:0 CFIO: Reading /discover/nobackup/projects/gmao/g6dev/sdrabenh/valdat/era5/ana_eta_daily/Y2020 /M01/era5_ana.eta_L137.20200120_00z.nc4 at 20200120 0

REPLAY File Dimensions: 1440 721 137

REPLAY File Variables, NQ: 7

       1 )  dp
       2 )  phis
       3 )  ps
       4 )  qv
       5 )  tv
       6 )  u
       7 )  v

REPLAY Options:

REPLAY_TS ....... NO REPLAY_P ........ YES REPLAY_U ........ YES REPLAY_V ........ YES REPLAY_T ........ YES REPLAY_QV ....... YES REPLAY_O3 ....... NO

Vertical Remapping ANA Data to BKG Topography and Levels ...

Blending Between 1.000000 & 50.00000 mb No blending of QV based on TROPP

Applying Mass Divergence Fix ...

[borgf058:162020:0:162020] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2aabc9579000) ==== backtrace ==== [borgf059:247038:0:247038] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10) ==== backtrace ==== [borgf143:47492:0:47492] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10) [borgf050:166372:0:166372] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10) ==== backtrace ====

bena-nasa commented 4 years ago

Breaks at c24 luckily when I enabled JCAP in a standard replay from MERRA2

mathomp4 commented 4 years ago

@bena-nasa I think we've been here before. From #134 (this comment:

The issue turned out to be that the GEOSgcm.x is now linking to the r8i4 version of the NCEP_sp library. The CVS build was linking to the r4i4 version. The reason it is picking up the r8i4 version is that GMAO_tranf in shared was linking to the r8i4 version which apparently the linking in the GEOSgcm.x inherits. Telling GMAO_transf that it should use the r4i4 version via the CMakeLists.txt file allows GEOSgcm.x to link to the r4i4 version and lets the JCAP option in mkiau run but when the stochastic physics is turned on this results in a non-zero diff to the original code.

We might need @tclune level CMake trickery for this. Something a la the fms_r4/r8

ETA: Essentially, it's easy to fix if you don't mind the stochastic physics becoming non-zero-diff. That might not be a "bad" thing as it would just be "different" stochastic randomness but that's an Amal question.

tclune commented 4 years ago

Made the mistake of reading this late in the evening. Making my head hurt. Will try again in the morning.

tclune commented 4 years ago

The executable (GEOSgcm.x) can only link to one build or the other of NCEP_sp. We can choose, but we cannot have both. (We could produce 2 executables though ...)

Presuming that the CVS build linked with r4i4, as cited by @mathomp4, the question is then why doing so now would change the answers. The best explanation would seem to be that at some point post-git conversion, GMAO_tranf triggered a change to use the r8i4 library, and switching back to r4i4 is the "right" thing to do.

I fear that the answer is going to be to use r4i4 for GCM and use r8i4 for DAS.

bena-nasa commented 4 years ago

Should I just make the PR to move GMAO_transf to use the r4i4 sp library? I did confirm it still does fix the JCAP issue and as Tom said, we can only link to one. The stochastic physics still runs when using the r4 library. It is non-zero diff. However, whatever library is used, the code crashes in some code called by the stochastic physics with an out of bounds error with debugging on, so that has other issues that need to be fixed anyway.

atrayano commented 4 years ago

@bena-nasa My 2 cents, I think it is a good idea to have a PR to fix this issue (by using r4 version) and in addition, open another issue describing the out-of-bounds issue when using stochastic physics. I am not super concerned about the non-zero diff, although ultimately this is @wmputman and @sdrabenh decision whether to adopt this solution

bena-nasa commented 4 years ago

Ok, I've made the PR in GMAO_Shared to change this. Ultimately up to Scott and Bill if they want to take this.

mathomp4 commented 4 years ago

GMAO_Shared PR is here: https://github.com/GEOS-ESM/GMAO_Shared/pull/107