ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
307 stars 310 forks source link

nldas2 grid fails when run in DEBUG mode #1724

Closed ekluzek closed 1 year ago

ekluzek commented 2 years ago

Brief summary of bug

The following test fails in datm when run in DEBUG mode

SMS_D_Ld1_PS.nldas2_rnldas2_mnldas2.I2000Ctsm50NwpSpNldasRs.cheyenne_intel.clm-default

General bug information

CTSM version you are using: ctsm5.1.dev091 Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: nldas2 grid

Details of bug

It looks like the problem is inside of datm so hence may be a problem with CDEPS/datm rather than CTSM.

Important details of your setup / configuration so we can reproduce the bug

We do the above test without DEBUG mode for all tags. So it does run when done with DEBUG mode.

Important output or errors that show the problem

4:MPT:    from /glade/u/apps/ch/os/lib64/libpthread.so.0
4:MPT: #1  0x00002b8986101306 in mpi_sgi_system (
4:MPT: #2  MPI_SGI_stacktraceback (
4:MPT:     header=header@entry=0x7ffe62735f50 "MPT ERROR: Rank 4(g:4) received signal SIGFPE(8).\n\tProcess ID: 58542, Host: r12i6n14, Program: /glade/scratch/erik/SMS_D_Ld1_PS.nldas2_rnldas2_mnldas2.I2000Ctsm50NwpSpNldasRs.cheyenne_intel.clm-defaul"...) at sig.c:340
4:MPT: #3  0x00002b89861014ff in first_arriver_handler (signo=signo@entry=8, 
4:MPT:     stack_trace_sem=stack_trace_sem@entry=0x2b89959a0080) at sig.c:489
4:MPT: #4  0x00002b8986101793 in slave_sig_handler (signo=8, siginfo=<optimized out>, 
4:MPT:     extra=<optimized out>) at sig.c:565
4:MPT: #5  <signal handler called>
4:MPT: #6  0x00000000009af0d8 in datm_datamode_clmncep_mod::datm_datamode_clmncep_advance (mainproc=.FALSE., logunit=6, mpicom=24, rc=0)
4:MPT:     at /glade/scratch/erik/ctsm5.1.dev091/components/cdeps/datm/datm_datamode_clmncep_mod.F90:422
4:MPT: #7  0x00000000009a1bc2 in atm_comp_nuopc::datm_comp_run (importstate=..., 
4:MPT:     exportstate=..., target_ymd=20000101, target_tod=0, target_mon=1, 
4:MPT:     orbeccen=0.016703660392765603, orbmvelpp=4.9374577904881578, 
4:MPT:     orblambm0=-0.032472495661529328, orbobliqr=0.40910112257977893, 
4:MPT:     restart_write=.FALSE., rc=0)
4:MPT:     at /glade/scratch/erik/ctsm5.1.dev091/components/cdeps/datm/atm_comp_nuopc.F90:630
4:MPT: #8  0x000000000099f7da in atm_comp_nuopc::initializerealize (gcomp=..., 
4:MPT:     importstate=..., exportstate=..., clock=..., rc=0)
4:MPT:     at /glade/scratch/erik/ctsm5.1.dev091/components/cdeps/datm/atm_comp_nuopc.F90:411
4:MPT: #9  0x00002b897fa5c489 in ESMCI::FTable::callVFuncPtr (this=0xaaf52f0, 
4:MPT:     name=0xaaea010 "InitializeIC07P", vm_pointer=0xaaf5170, 
4:MPT:     userrc=0x7ffe62739c08)
4:MPT:     at /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Superstructure/Component/src/ESMCI_FTable.C:2167
4:MPT: #10 0x00002b897fa57c31 in ESMCI_FTableCallEntryPointVMHop (vm=0xaaf5170, 
4:MPT:     cargoCast=0xaaea010)
ekluzek commented 2 years ago

The CDEPS issue this refers to is:

https://github.com/ESCOMP/CDEPS/issues/143

mvertens commented 2 years ago

@ekluzek - I have incorporated a fix that is in the latest CDEPS tag cdeps0.12.47 that makes this case now work. I have compared the cply and cmeps test results and things look good. If you update to the cdeps tag - you will also need to update cime, ccs_config, cmeps and share in ctsm.

ekluzek commented 2 years ago

Perfect -- thank you! I'm assuming this means the nldas2 cases will have different answers?

mvertens commented 2 years ago

Yes - using nuopc will always give you different answers. I have verified that the values coming in from the nldas2 forcings (that are on the same grid as the model) are only roundoff - however the aerosol forcings are mapped from a 2 degree grid to the nldas2 mesh - and since ESMF bilinear mapping is different than the internal cpl7 data model mapping - there will be differences.

ekluzek commented 2 years ago

@mvertens no I mean our non-DEBUG test cases we run with nldas2 with nuopc. Here are the two baselines for ctsm5.1.dev091 on cheyenne. I'm just wondering if I should expect answers to change when I run the test suite for these two tests...

/glade/p/cgd/tss/ctsm_baselines/ctsm5.1.dev091/SMS_Ld1_PS.nldas2_rnldas2_mnldas2.I2000Ctsm50NwpSpNldas.cheyenne_gnu.clm-default /glade/p/cgd/tss/ctsm_baselines/ctsm5.1.dev091/SMS_Ld1_PS.nldas2_rnldas2_mnldas2.I2000Ctsm50NwpSpNldasRs.cheyenne_gnu.clm-default

ekluzek commented 1 year ago

The underlying CDEPS issue was fixed. And this seems to have been fixed by ctsm5.1.dev116 so closing.