ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
310 stars 314 forks source link

hist_fexcl and hist_fincl do not get appropriately checked together #2867

Open adrifoster opened 2 weeks ago

adrifoster commented 2 weeks ago

Brief summary of bug

If a variable is in both hist_fexcl and hist_fincl the code seems to assume hist_fincl. This can result in a crash if having the variable doesn't work for your code.

General bug information

CTSM version you are using: ctsm5.3.009

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: any

Details of bug

In trying to test ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdST3 I got a fail because ST3 mode does not output FATES_ERROR_EL. This variable is in the hist_fexcl1 but it is also in the hist_fincl1. The run submits but then fails when it reads the variable list:

dec0617.hsn.de.hpc.ucar.edu 612:  not found in the list of fates_hist%hvars.
dec0610.hsn.de.hpc.ucar.edu 188:  ERROR: ERROR in clmfates_interfaceMod.F90 at line 670
dec0613.hsn.de.hpc.ucar.edu 273:  not found in the list of fates_hist%hvars.
dec0614.hsn.de.hpc.ucar.edu 427:  ERROR: ERROR in clmfates_interfaceMod.F90 at line 670
dec0617.hsn.de.hpc.ucar.edu 612:  Most likely, this is because this history variable
dec0610.hsn.de.hpc.ucar.edu 189:  the history field: FATES_ERROR_EL
dec0613.hsn.de.hpc.ucar.edu 273:  Most likely, this is because this history variable
dec0617.hsn.de.hpc.ucar.edu 612:  was specified in the user namelist, but the user
dec0610.hsn.de.hpc.ucar.edu 189:  was requested in the namelist, but was
dec0613.hsn.de.hpc.ucar.edu 273:  was specified in the user namelist, but the user

Important details of your setup / configuration so we can reproduce the bug

 hist_fexcl1 = 'FATES_ERROR_EL'
 hist_fields_list_file = .false.
 hist_fincl1 = 'FATES_NCOHORTS', 'FATES_TRIMMING', 'FATES_AREA_PLANTS', 'FATES_AREA_TREES', 'FATES_COLD_STATUS',
         'FATES_GDD', 'FATES_NCHILLDAYS', 'FATES_NCOLDDAYS', 'FATES_DAYSINCE_COLDLEAFOFF', 'FATES_DAYSINCE_COLDLEAFON',
         'FATES_CANOPY_SPREAD', 'FATES_NESTEROV_INDEX', 'FATES_IGNITIONS', 'FATES_FDI', 'FATES_ROS',
         'FATES_EFFECT_WSPEED', 'FATES_FUELCONSUMED', 'FATES_FIRE_INTENSITY', 'FATES_FIRE_INTENSITY_BURNFRAC', 'FATES_BURNFRAC',
         'FATES_FUEL_MEF', 'FATES_FUEL_BULKD', 'FATES_FUEL_EFF_MOIST', 'FATES_FUEL_SAV', 'FATES_FUEL_AMOUNT',
         'FATES_LITTER_IN', 'FATES_LITTER_OUT', 'FATES_SEED_BANK', 'FATES_SEEDS_IN', 'FATES_STOREC',
         'FATES_VEGC', 'FATES_SAPWOODC', 'FATES_LEAFC', 'FATES_FROOTC', 'FATES_REPROC',
         'FATES_STRUCTC', 'FATES_NONSTRUCTC', 'FATES_VEGC_ABOVEGROUND', 'FATES_CANOPY_VEGC', 'FATES_USTORY_VEGC',
         'FATES_PRIMARY_PATCHFUSION_ERR', 'FATES_HARVEST_WOODPROD_C_FLUX', 'FATES_DISTURBANCE_RATE_FIRE', 'FATES_DISTURBANCE_RATE_LOGGING', 'FATES_DISTURBANCE_RATE_TREEFALL',
         'FATES_STOMATAL_COND', 'FATES_LBLAYER_COND', 'FATES_NPP', 'FATES_GPP', 'FATES_AUTORESP',
         'FATES_GROWTH_RESP', 'FATES_MAINT_RESP', 'FATES_GPP_CANOPY', 'FATES_AUTORESP_CANOPY', 'FATES_GPP_USTORY',
         'FATES_AUTORESP_USTORY', 'FATES_DEMOTION_CARBONFLUX', 'FATES_PROMOTION_CARBONFLUX', 'FATES_MORTALITY_CFLUX_CANOPY', 'FATES_MORTALITY_CFLUX_USTORY',
         'FATES_NEP', 'FATES_HET_RESP', 'FATES_FIRE_CLOSS', 'FATES_FIRE_FLUX_EL', 'FATES_CBALANCE_ERROR',
         'FATES_ERROR_EL', 'FATES_LEAF_ALLOC', 'FATES_SEED_ALLOC', 'FATES_STEM_ALLOC', 'FATES_FROOT_ALLOC',
         'FATES_CROOT_ALLOC', 'FATES_STORE_ALLOC'
glemieux commented 2 weeks ago

Is this me misunderstanding how hist_fexcl1 should work? Is it meant only to exclude variables that are added by default? Or is the issue that the lnd_in file isn't placing the testmod options after the included testmod?

adrifoster commented 2 weeks ago

I'm not sure, currently because of the includes for this test, the variable is added to the hist_fincl1

glemieux commented 2 weeks ago

:100: @adrifoster. My assumption was that the order of the test mod updates would impact how the lnd_in file would be generated and how that would override history variable inclusion, but I'm realizing that's likely incorrect.

I'm also realizing that the error message was brought in as part of #2339. ~@rgknox I think we might need to add a simple check against the hist_fexcl depending on what the expected behavior of the same variable being part of both hist_fincl and hist_fexcl:~ https://github.com/ESCOMP/CTSM/blob/26b9aa7bdd5b0c1807f0f9bc12036548329c47b8/src/utils/clmfates_interfaceMod.F90#L656-L665

UPDATE: see below

glemieux commented 2 weeks ago

Tagging #2850 here to note that this came up trying to address this issue.

glemieux commented 2 weeks ago

Note that commenting out the call to CrossRefHistoryFields I run into this error:

2937  hist_htapes_build Initializing clm2 history files
2938 ------------------------------------------------------------
2939  htapes_fieldlist ERROR: FATES_ERROR_EL in fincl(          66 )
2940  for history tape            1  not found
2941  ENDRUN:
2942  ERROR: ERROR in histFileMod.F90 at line 885

Looking at the code, the issue for this specific test, IIUC, is that we're including FATES_ERROR_EL in the base testmod, but the fates init_history_io isn't even adding it to the allhistlfldlist, appropriately, because of this new line in fates: https://github.com/NGEET/fates/blob/1be3962933351b94fef134d607ac7d821d884f96/main/FatesHistoryInterfaceMod.F90#L8465-L8483. So the failure mode noted in the original comment is the technically correct behavior. The issue to address that failure mode is really #2850.

That said, I've got a test going to check if the original issue title is still techincally correct and worth addressing.

glemieux commented 2 weeks ago

I tried replicating a version of this with a fates run mode test mod that I know will include FATES_ERROR_EL in the allhistfldlist. I did this by temporarily adding hist_fexcl1 = 'FATES_ERROR_EL' to the FatesCold user_nl_clm file, which inherits the Fates testmod user_nl_clm with that variable included. I then ran the FatesCold testmod and it successfully ran, but it included FATES_ERROR_EL.

So I guess I'm wondering if we should keep this issue open and address whether or not the behavior listed in the title is correct relative to this variant I just ran, or start a new issue/discussion?

adrifoster commented 2 weeks ago

@glemieux i'm not sure I understand what you are saying. I think what you did is in support of the issue listed here. or am I missing something?

glemieux commented 2 weeks ago

Sorry. What I'm saying is that the failure mode in the first comment is the expected result of issue #2850 and what the title suggests is not truly at fault.

That said, the question the title suggest still stands given the results of the variant I noted here: https://github.com/ESCOMP/CTSM/issues/2867#issuecomment-2463277294. As such, I was thinking we should start a new discussion to ask what the expected behavior of setting hist_fexcl and hist_fincl for the same history output should be since the comments in this issue might confuse readers and would start the conversation off more cleanly.

glemieux commented 2 weeks ago

Per discussion at the stand up meeting today, we decided to keep the issue open as we want the behavior for setting the same variable in hist_fexcl and hist_fincl to be different than failing at run time with the error message in the original post comment. Ideally this would either:

For the second option, the documentation should probably be updated to explain to the user the expected behavior as it will be opaque to the user.