NGEET / fates

repository for the Functionally Assembled Terrestrial Ecosystem Simulator (FATES)
Other
105 stars 92 forks source link

New patch insertion method #1155

Closed ckoven closed 4 weeks ago

ckoven commented 10 months ago

Sort patches in a simple way that orders them by land use, no comp PFT, and age.

Description:

As part of #1116, I realized there is a simpler way to sort the patches that takes into account all of their continuous (age) and categorical (possibly LU and/or PFT) variables, without needing to have a lot of nested logic. This way the patches will be in some predictable order, which may or may not be important, but seems better than not having them in any predictable order. I separated this from the rest of #1116 and submit it here via a git cherry-pick onto main because it should almost certainly introduce non-B4B changes.

The idea is to make a pseudo-age that takes into account the various patch labels, and then sort by that. The pseudo age is to take some large number N that a patch will never be older than (10,000 years, nominally), and make the pseudo-age = LU * N*2 + PFT N + age.

Collaborators:

Discussed with @glemieux.

Expectation of Answer Changes:

I would expect this to change answers.

Checklist

If this is your first time contributing, please read the CONTRIBUTING document.

All checklist items must be checked to enable merging this pull request:

Contributor

Integrator

Documentation

Test Results:

CTSM (or) E3SM (specify which) test hash-tag:

CTSM (or) E3SM (specify which) baseline hash-tag:

FATES baseline hash-tag:

Test Output:

glemieux commented 9 months ago

@ckoven thanks for isolating this in a new PR. Is this required for #1116 or will that PR run without this update?

ckoven commented 9 months ago

I don't think there are any required dependencies either way.

rgknox commented 2 months ago

@ckoven I think this looks solid. We may also want to pre-multiply any constants into compiled constants in the new function. Compilers may already to it, but why not avoid math when possible.

rgknox commented 1 month ago

Plenty of diffs with base generated on this test:

ryan@derecho:/glade/derecho/scratch/rgknox/tests_1018-111221de> ./cs.status.fails 
1018-111221de_gnu: 10 tests
    FAIL ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_gnu.clm-FatesColdTwoStreamNoCompFixedBioGeo COMPARE_base_rest (EXPECTED FAILURE)
    FAIL ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_gnu.clm-FatesColdTwoStreamNoCompFixedBioGeo BASELINE fates-sci.1.79.0_api.36.1.0-ctsm5.3.006: DIFF
    FAIL PEM_D_Ld15.f10_f10_mg37.I2000Clm50FatesRs.derecho_gnu.clm-FatesColdSeedDisp COMPARE_base_modpes (EXPECTED FAILURE)
    FAIL SMS_D_Ld3.f09_g17.I2000Clm60FatesSpCruRsGs.derecho_gnu.clm-FatesColdSatPhen_prescribed RUN time=184 (EXPECTED FAILURE)
    PASS SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Fates.derecho_gnu.clm-FatesPRISM--clm-NEON-FATES-YELL SHAREDLIB_BUILD time=301 (UNEXPECTED: expected FAIL)
    PASS SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Fates.derecho_gnu.clm-FatesPRISM--clm-NEON-FATES-YELL RUN time=121 (UNEXPECTED: expected FAIL)

1018-111221de_int: 38 tests
    FAIL ERP_Ld9.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdAllVars COMPARE_base_rest
    FAIL ERP_P128x2_Ld30.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhen COMPARE_base_rest
    FAIL ERP_P256x2_Ld30.f45_f45_mg37.I2000Clm60FatesRs.derecho_intel.clm-mimicsFatesCold RUN time=105 (EXPECTED FAILURE)
    PEND ERP_P256x2_Ld30.f45_f45_mg37.I2000Clm60FatesRs.derecho_intel.clm-mimicsFatesCold COMPARE_base_rest
    FAIL ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_intel.clm-FatesColdTwoStream COMPARE_base_rest (EXPECTED FAILURE)
    PEND ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLandUse SHAREDLIB_BUILD (UNEXPECTED: expected FAIL)
    FAIL ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLUH2 BASELINE fates-sci.1.79.0_api.36.1.0-ctsm5.3.006: DIFF
    FAIL ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLUH2HarvestArea BASELINE fates-sci.1.79.0_api.36.1.0-ctsm5.3.006: DIFF
    FAIL ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLUH2HarvestMass BASELINE fates-sci.1.79.0_api.36.1.0-ctsm5.3.006: DIFF
    FAIL ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdNoCompFixedBioGeo BASELINE fates-sci.1.79.0_api.36.1.0-ctsm5.3.006: DIFF
    FAIL ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdNoComp BASELINE fates-sci.1.79.0_api.36.1.0-ctsm5.3.006: DIFF
    FAIL ERS_Ld30.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhen COMPARE_base_rest
    FAIL ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLogging BASELINE fates-sci.1.79.0_api.36.1.0-ctsm5.3.006: DIFF
    FAIL ERS_Lm12.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesFireLightningPopDens COMPARE_base_rest
    FAIL ERS_Lm13.f45_f45_mg37.I2000Clm50Fates.derecho_intel.clm-FatesColdNoComp BASELINE fates-sci.1.79.0_api.36.1.0-ctsm5.3.006: DIFF
    FAIL ERS_P128x1_Lm25.f10_f10_mg37.I2000Clm60Fates.derecho_intel.clm-FatesColdNoComp BASELINE fates-sci.1.79.0_api.36.1.0-ctsm5.3.006: DIFF
    FAIL PVT_Lm3.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesLUPFT BASELINE fates-sci.1.79.0_api.36.1.0-ctsm5.3.006: DIFF
    PASS SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Fates.derecho_intel.clm-FatesFireLightningPopDens--clm-NEON-FATES-NIWO SHAREDLIB_BUILD time=338 (UNEXPECTED: expected FAIL)
glemieux commented 1 month ago

I decided to investigate the differences that we're seeing with non-landuse test cases and found what I think is ultimately a minor issue, but something that maybe should be addressed. I noticed that the diffs are happening immediately on the second timestep (i.e. the first timestep after initialization). Writing out the variables necessary for calculating the psuedopatch age and picking a random process, I found that the patch order was incorrect based on the calculated psuedo age.

What I think is happening is that the new method flips the assumed order of the nocomp patches on a site via the psuedo patch age calculation. In the new method, higher pft numbers are associated with "older" patches (i.e. pfts 1 -> 14 go younger to older). This is in contrast to the way we initialize the patches which is to iterate starting at pft 1: https://github.com/NGEET/fates/blob/f5bd62506e59adf232427dce9d4993483c8b14b3/main/EDInitMod.F90#L827-L830 and then insert the patches in that order: https://github.com/NGEET/fates/blob/f5bd62506e59adf232427dce9d4993483c8b14b3/main/EDInitMod.F90#L853-L876

Given that the initial layout is 1 -> 14, oldest to youngest, and given that InsertPatch is only ever called except upon disturbance, this means that it will take some time for the patches to flip to the assumed state in the new method.

As such, the fix for this would be to either flip the initialization scheme or flip the psuedo patch age scheme.

glemieux commented 1 month ago

Retesting just the FatesColdNoComp testmod using the fix pushed via ea22249 is still show diffs against baseline, but looking at the additional diagnostics I added it seems like this is mostly just down to roundoff variation in the initial patch fusions. I'm going to rerun the tests against baselines with the last updates.

glemieux commented 4 weeks ago

Regression testing on derecho completed over the weekend. The results are mostly b4b, with the exception of the landuse and nocomp based tests, as expected. The satellite phenology only differ in the fieldlist per 147d411. @rgknox there is one issue in that the ST3 test fails run as it can't find FATES_ERROR_EL which is include in the test user_nl_clm. I can fix this locally to make sure the test runs sucessfully, but I wanted to double check that we should work to remove this from the testmod in the future.

test location: /glade/u/home/glemieux/scratch/ctsm-tests/tests_pr1155-restest-1

glemieux commented 4 weeks ago

Manually editing the ST3 test user namelist to remove FATES_ERROR_EL results in only fieldlist differences for that test. I've made a ctsm issue to make sure we remove this at the next CTSM PR update opportunity: https://github.com/ESCOMP/CTSM/issues/2850