ESCOMP / RTM

River Transport Model, RTM, part of the Community Earth System Model
http://www.cesm.ucar.edu/
Other
1 stars 13 forks source link

Error in writing out r4 real data #11

Open ekluzek opened 6 years ago

ekluzek commented 6 years ago

This is with release-cesm2.0.01 which brings in a change to write out real data as r4 instead of everything being r8.

The following tests fail:

ERP_Ld5.f10_f10_musgs.I2000Clm50Vic.cheyenne_gnu.clm-decStart.GC.release-clm5009chgnua ERP_P36x2_D_Ld5.f10_f10_musgs.I1850Clm45Bgc.cheyenne_gnu.clm-default.GC.release-clm5009chgnua ERP_P36x2_D_Ld5.f10_f10_musgs.I1850Clm45BgcCru.cheyenne_intel.clm-default.GC.release-clm5009chintela ERP_P36x2_D_Ld5.f10_f10_musgs.I2000Clm45Sp.cheyenne_intel.clm-default.GC.release-clm5009chintela ERP_P36x2_D_Ld5.f10_f10_musgs.IHistClm45BgcCruGs.cheyenne_intel.clm-decStart.GC.release-clm5009chintela ERS_Ly5_P72x1.f10_f10_musgs.IHistClm45BgcCrop.cheyenne_intel.clm-cropMonthOutput.GC.release-clm5009chintela

The fail looks like:

1: pio_support::pio_die:: myrank=          -1 : ERROR: 
1: pionfwrite_mod::write_nfdarray_real:         250 : 
1: NetCDF: Numeric conversion not representable
37: pionfwrite_mod::write_nfdarray_real         107   IAM:            1  start: 
37:                     1                   181  count:                    720
37:                   180  size :                     1  error:          -60
37:         598           0
37: pio_support::pio_die:: myrank=          -1 : ERROR: 
37: pionfwrite_mod::write_nfdarray_real:         250 : 
37: NetCDF: Numeric conversion not representable
1:Image              PC                Routine            Line        Source             
1:cesm.exe           00000000019272AD  Unknown               Unknown  Unknown
1:cesm.exe           000000000124DAA1  pio_support_mp_pi         118  pio_support.F90
1:cesm.exe           000000000124BC51  pio_utils_mp_chec          59  pio_utils.F90
1:cesm.exe           000000000134E9DA  pionfwrite_mod_mp         250  pionfwrite_mod.F90.in
1:cesm.exe           000000000130D2AA  piodarray_mp_writ         650  piodarray.F90.in
1:cesm.exe           000000000130FF51  piodarray_mp_writ         221  piodarray.F90.in
1:cesm.exe           0000000000C93F30  rtmio_mp_ncd_io_r        1825  RtmIO.F90
1:cesm.exe           0000000000C86EF0  rtmhistfile_mp_ht         930  RtmHistFile.F90
1:cesm.exe           0000000000C8476C  rtmhistfile_mp_rt        1099  RtmHistFile.F90
1:cesm.exe           0000000000CA0ED9  rtmmod_mp_rtmrun_        1309  RtmMod.F90
1:cesm.exe           0000000000C795A8  rof_comp_mct_mp_r         298  rof_comp_mct.F90
1:cesm.exe           0000000000424B34  component_mod_mp_         728  component_mod.F90
1:cesm.exe           000000000040A369  cime_comp_mod_mp_        2675  cime_comp_mod.F90
1:cesm.exe           0000000000424862  MAIN__                    103  cime_driver.F90
1:cesm.exe           000000000040829E  Unknown               Unknown  Unknown
1:libc-2.19.so       00002AAAB04DCB25  __libc_start_main     Unknown  Unknown
1:cesm.exe           00000000004081A9  Unknown               Unknown  Unknown
1:MPT ERROR: Rank 1(g:1) is aborting with error code 1.

The problem is likely that there are nans that are trying to be converted to real(r4), and it can't do that.

ekluzek commented 6 years ago

OK, so the problem isn't NaN's or Inf or anything like that. But, the problems are large enough that we will back out the conversion to single-precision, when history is written in favor of similar code to CTSM where single-precision history accumulators are used and then output as they are rather than with conversion. This mechanism has been proven to be more robust, and catches conversion errors sooner in the process.