ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
309 stars 313 forks source link

-fire_emis NoAnthro tests fail because surface dataset is 16pft rather than 78 to match the new fire-emis file #2759

Closed ekluzek closed 1 month ago

ekluzek commented 1 month ago

Brief summary of bug

SMS_D_Ld3_PS.f09_g17.I1850Clm60SpNoAnthro.derecho_intel.clm-decStart1851_noinitial

test fails due to a glitch in fire-emissions which were turned on for in the ctsm5.3.0 prototype.

General bug information

CTSM version you are using: branch_tags/ctsm5.3.n03_ctsm5.2.028-20-g317dc11d0

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected:

Some Sp simulations with -fire-emis on and the fire_emission_factors_78PFTs_c20240624.nc file

I don't see what's different about this test from the ones that pass

Here's the list of Sp tests with fire_emis on that pass:

ERP_D_Ld3_PS.f09_g17.I2000Clm50Sp.derecho_intel.clm-prescribed ERP_D_Ld5.f10_f10_mg37.I2000Clm60Sp.derecho_intel.clm-decStart ERP_D_Ld5.f10_f10_mg37.IHistClm45Sp.derecho_intel.clm-decStart ERP_D_Ld5.f10_f10_mg37.IHistClm50SpCru.derecho_gnu.clm-drydepnomegan ERP_D_Ld5.f10_f10_mg37.IHistClm60Sp.derecho_intel.clm-default ERP_D_Ld5.ne30pg3_t232.IHistClm51Sp.derecho_intel.clm-default ERP_P64x2_D.f10_f10_mg37.I2000Clm50SpRtmFl.derecho_intel.clm-default ERP_P64x2_D_Ld10.f10_f10_mg37.IHistClm50SpG.derecho_intel.clm-glcMEC_decrease ERP_P64x2_D_Ld5.f10_f10_mg37.I2000Clm45Sp.derecho_intel.clm-default ERP_P64x2_D_Ld5.f10_f10_mg37.I2000Clm50Sp.derecho_gnu.clm-default ERS_D_Ld10.f10_f10_mg37.IHistClm50Sp.derecho_intel.clm-collapse_pfts_78_to_16_decStart_f10 NCK_Ld1.f10_f10_mg37.I2000Clm50Sp.derecho_intel.clm-default SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.derecho_intel.clm-ptsRLA SMS_D_Ld1_PS.f09_g17.I1850Clm50Sp.derecho_intel.clm-default SMS_D_Ld1_PS.f19_f19_mg17.I2010Clm50Sp.derecho_intel.clm-clm50cam6LndTuningMode SMS_D_Ln9_P128x3.f19_g17.IHistClm50Sp.derecho_intel.clm-waccmx_offline SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60SpRs.derecho_intel.clm-default--clm-NEON-TOOL SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long SMS_Ln9.f10_f10_mg37.I2000Clm50Sp.derecho_gnu.clm-clm50cam5LndTuningModeZDustSoilErod SMS_Ln9.ne30pg2_ne30pg2_mg17.I1850Clm50Sp.derecho_intel.clm-clm50cam6LndTuningMode SMS_Ln9.ne3pg3_ne3pg3_mg37.I2000Clm50Sp.derecho_gnu.clm-clm50cam6LndTuningMode SMS_P384x2_D_Ld5.f19_g17.I2000Clm50Sp.derecho_intel.clm-default

Details of bug

It turns out we normally run with -fire_emis on for almost all of our tests (except FATES tests). Note that when you run with Sp compsets the coupler fire variables are just output as missing so there really isn't a good reason to run Sp compsets with fire_emis on.

Important output or errors that show the problem

dec2247.hsn.de.hpc.ucar.edu 764: forrtl: severe (408): fort: (2): Subscript #1 of the array FACTORS has value 17 which is greater than the upper bound of 16
dec2247.hsn.de.hpc.ucar.edu 764:
dec2247.hsn.de.hpc.ucar.edu 764: Image              PC                Routine            Line        Source
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           000000000255AACF  fireemisfactorsmo          76  FireEmisFactorsMod.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           000000000125BDFE  cnfireemissionsmo          68  CNFireEmissionsMod.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           0000000000AB6ED2  clm_instmod_mp_cl         400  clm_instMod.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           0000000000AA7F21  clm_initializemod         409  clm_initializeMod.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           00000000009AF478  lnd_comp_nuopc_mp         659  lnd_comp_nuopc.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E8279  callVFuncPtr             2167  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E72B8  ESMCI_FTableCallE         824  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7684AB2  enter                    2501  ESMCI_VMKernel.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF766D346  enter                    1216  ESMCI_VM.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E865F  c_esmc_ftablecall         981  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7C6C4FC  esmf_compmod_mp_e        1252  ESMF_Comp.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF853B87E  esmf_gridcompmod_        1419  ESMF_GridComp.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF8FC4D11  nuopc_driver_mp_l        2889  NUOPC_Driver.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF8FAE640  nuopc_driver_mp_i        1982  NUOPC_Driver.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E8279  callVFuncPtr             2167  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E72B8  ESMCI_FTableCallE         824  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7684AB2  enter                    2501  ESMCI_VMKernel.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF766D346  enter                    1216  ESMCI_VM.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E865F  c_esmc_ftablecall         981  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7C6C4FC  esmf_compmod_mp_e        1252  ESMF_Comp.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF853B87E  esmf_gridcompmod_        1419  ESMF_GridComp.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF8FC4D11  nuopc_driver_mp_l        2889  NUOPC_Driver.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF8FAE885  nuopc_driver_mp_i        1987  NUOPC_Driver.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF8F762EE  nuopc_driver_mp_i         487  NUOPC_Driver.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E8279  callVFuncPtr             2167  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E72B8  ESMCI_FTableCallE         824  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7684AB2  enter                    2501  ESMCI_VMKernel.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF766D346  enter                    1216  ESMCI_VM.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF71E865F  c_esmc_ftablecall         981  ESMCI_FTable.C
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF7C6C4FC  esmf_compmod_mp_e        1252  ESMF_Comp.F90
dec2247.hsn.de.hpc.ucar.edu 764: libesmf.so         0000146DF853B87E  esmf_gridcompmod_        1419  ESMF_GridComp.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           0000000000448EC8  MAIN__                    128  esmApp.F90
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           000000000042167D  Unknown               Unknown  Unknown
dec2247.hsn.de.hpc.ucar.edu 764: libc-2.31.so       0000146DE9E5B29D  __libc_start_main     Unknown  Unknown
dec2247.hsn.de.hpc.ucar.edu 764: cesm.exe           00000000004215AA  Unknown               Unknown  Unknown
ekluzek commented 1 month ago

Note, I'm also seeing this for the Bgc NoAnthro test:

SMS_D_Ld3_PS.f09_g17.I1850Clm60BgcNoAnthro.derecho_intel.clm-decStart1851_noinitial--clm-matrixcnOn

so this problem is somehow linked to the NoAnthro setup and not just whether the test is Sp or Bgc vs BgcCrop.

ekluzek commented 1 month ago

In the standup @samsrabin suggested in the discussion for a couple ideas to try:

samsrabin commented 1 month ago

This patch should solve it, although I haven't even tested whether it builds:

diff --git a/src/biogeochem/FireEmisFactorsMod.F90 b/src/biogeochem/FireEmisFactorsMod.F90
index e97082c0b..7f7f470f3 100644
--- a/src/biogeochem/FireEmisFactorsMod.F90
+++ b/src/biogeochem/FireEmisFactorsMod.F90
@@ -11,6 +11,7 @@ module FireEmisFactorsMod
   use shr_kind_mod, only : r8 => shr_kind_r8
   use abortutils,   only : endrun
   use clm_varctl,   only : iulog
+  use clm_varpar,   only : maxveg
 !
   implicit none
   private
@@ -20,8 +21,6 @@ module FireEmisFactorsMod
   public :: fire_emis_factors_init
   public :: fire_emis_factors_get

-! !PRIVATE MEMBERS:
-  integer :: npfts ! number of plant function types
 !
   type emis_eff_t
      real(r8), pointer :: eff(:) ! emissions efficiency factor
@@ -73,10 +72,7 @@ contains
        call endrun(errmes)
     endif

-    factors(:npfts) = comp_factors_table( ndx )%eff(:npfts)
-    if ( size(factors) > npfts )then
-       factors(npfts+1:) = comp_factors_table( ndx )%eff(nc3crop)
-    end if
+    factors(:maxveg) = comp_factors_table( ndx )%eff(:maxveg)
     molecwght  = comp_factors_table( ndx )%wght

   end subroutine fire_emis_factors_get
@@ -126,9 +122,8 @@ contains
     call ncd_inqdlen( ncid, dimid, n_comps, name='Comp_Num')
     call ncd_inqdlen( ncid, dimid, n_pfts, name='PFT_Num')

-    npfts = n_pfts
-    if ( npfts /= mxpft .and. npfts /= 16 )then
-       call endrun('Number of PFTs on fire emissions file is NOT correct. Its neither the total number of PFTS nor 16')
+    if ( n_pfts < maxveg )then
+       call endrun('Number of PFTs on fire emissions file is less than the number of PFTs in the run')
     end if

     ierr = pio_inq_varid(ncid,'Comp_EF',  comp_ef_vid)
@@ -146,7 +141,7 @@ contains
     call  bld_hash_table_indices( comp_names )
     do i=1,n_comps
        start=(/i,1/)
-       count=(/1,npfts/)
+       count=(/1,n_pfts/)
        ierr = pio_get_var( ncid, comp_ef_vid,  start, count, comp_factors )

        call enter_hash_data( trim(comp_names(i)), comp_factors, comp_molecwghts(i)  )
ekluzek commented 1 month ago

I replicated the issue with a standard 16-pft dataset (so not a NoAnthro one) on Izumi with the nag compiler as follows:

i017.cgd.ucar.edu:mpi_rank_29][error_sighandler] Caught error: Aborted (signal 6)
Runtime Error: /fs/cgd/data0/erik/ctsm_worktree/quickfix/src/biogeochem/FireEmisFactorsMod.F90, line 76: Subscript 1 of by ESMAPP
[i017.cgd.ucar.edu:mpi_rank_10][error_sighandler] Caught error: Aborted (signal 6)
FACTORS (value 78) is out of range (1:16)400: Called by CLM_INSTMOD:CLM_INSTINIT
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/main/clm_initializeMod.F90, li
Program terminated by fatal error
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/biogeochem/FireEmisFactorsMod.F90, line 76: Error occurred in FIREEMISFACTORSMOD:FIRE_EMIS_FACTORS_GET
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/biogeochem/CNFireEmissionsMod.F90, ne 409: Called by CLM_INITIALIZEMOD:INITIALIZE2
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/cpl/nuopc/lnd_comp_nuopc.F90, line 659: Called bline 68: Called by CNFIREEMISSIONSMOD:INIT
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/main/clm_instMod.F90, line 400: Cally LND_COMP_NUOPC:INITIALIZEREALIZE
/fs/cgd/data0/erik/ctsm_worktree/quickfix/components/cmeps/cime_config/../cesm/driver/esmApp.F90, line 128: Called by ESMAPP
ed by CLM_INSTMOD:CLM_INSTINIT
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/main/clm_initializeMod.F90, line 409: Called by CLM_INITIALIZEMOD:INITIALIZE2
/fs/cgd/data0/erik/ctsm_worktree/quickfix/src/cpl/nuopc/lnd_comp_nuopc.F90, line 659: Called by LND_COMP_NUOPC:INITIALIZEREALIZE
/fs/cgd/data0/erik/ctsm_worktree/quickfix/components/cmeps/cime_config/../cesm/driver/esmApp.F90, line 128: Called by ESMAPP
[i017.cgd.ucar.edu:mpi_rank_21][error_sighandler] Caught error: Aborted (signal 6)
[i017.cgd.ucar.edu:mpi_rank_0][error_sighandler] Caught error: Aborted (signal 6)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 243471 RUNNING AT i017.cgd.ucar.edu
=   EXIT CODE: 134
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
ekluzek commented 1 month ago

This was resolved in ctsm5.3.0