E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
346 stars 353 forks source link

ELM carbon state mismatch error between initial and final spin-up runs #4943

Closed serbinsh closed 2 years ago

serbinsh commented 2 years ago

I am running into an error when attempting to spin-up ELM (v1.2 and v2). I get this error with variety of cases, from CNP to BGC. The error mismatch varies but its always very large. The mismatch can decrease with longer spin up periods but no matter what I can't seem to get second spin up run to advance beyond one year due to this mismatch

                                  gross primary product     0.21002100 |      1331267665.53
                                  ecosystem respiration    -0.18714354 |     -1186253506.84
                                            fire C loss    -0.01995164 |      -126468168.24
                       excess MR pool harvest mortality     0.00000000 |               0.00
            decomposition loss from 1-year product pool     0.00000000 |               0.00
           decomposition loss from 10-year product pool     0.00000000 |               0.00
          decomposition loss from 100-year product pool     0.00000000 |               0.00
                     SOM C loss from vertical transport     0.00000000 |               0.00
                                             SOM C loss     0.00000000 |               0.00
              flux to atmosphere due to dynamic weights     0.00000000 |               0.00
             seed source to leaf due to dynamic weights     0.00000000 |               0.00
       seed source to dead steam due to dynamic weights     0.00000000 |               0.00
                                                  *SUM*     0.00292582 |        18545990.45

 CARBON STATES (kgC/m2*1e6): period all_time: date =        20101           0

                                        beg              end          |        *NET CHANGE*
                    PFT          55524023.28         55726765.75      |          202742.47
                    CWD          12002474.93         11931661.91      |          -70813.02
           Total litter           1874931.47          1756333.26      |         -118598.21
              Total SOM         115415084.47        115457114.88      |           42030.41
     Total wood product                 0.00                0.00      |               0.00
        Truncation sink                 7.64                7.64      |               0.00
      Crop seed deficit                 0.00                0.00      |               0.00
               Grid-level Err           0.00               -0.00      |              -0.00
           *SUM*                           55361.65                   |           55361.65
 time integrated flux =    18545990.447155211
 net change in state  =    55361.649008180379
 error                =    18490628.798147030
 ENDRUN:ERROR in /E3SM/components/elm/src/biogeochem/CNPBudgetMod.F90 at line 916
 ERROR: Unknown error submitted to shr_abort_abort.

I was speaking with Ben Sulman and Fengming Yuan who are aware of this issue but suggested there may have already been a bug fix provided? If not, Ben suggested below:

Possible fix? https://github.com/bsulman/E3SM/commit/1328231cfc012e313dbcbbde25289b32965bf82e

However, as I mentioned so far using release version 1.2 and 2.0, I am still struggling with this error. As a note, these are single pixel, cpl_bypass runs.

I am providing this in case a fix hasnt been incorperated and in case its felt it would be good to add to the mainline code. Or perhaps this is user error?

serbinsh commented 2 years ago

I forgot to note that this error happens after 1 year of the run using the initial restart file as init conditions. This is how the error manifests every time, after 1 full year of the second spinup run on the first timestamp of year 2.

 Beginning timestep   : 0001-12-31_14:00:00
 Beginning timestep   : 0001-12-31_15:00:00
 Beginning timestep   : 0001-12-31_16:00:00
 Beginning timestep   : 0001-12-31_17:00:00
 Beginning timestep   : 0001-12-31_18:00:00
 Beginning timestep   : 0001-12-31_19:00:00
 Beginning timestep   : 0001-12-31_20:00:00
 Beginning timestep   : 0001-12-31_21:00:00
 Beginning timestep   : 0001-12-31_22:00:00
 Beginning timestep   : 0001-12-31_23:00:00
 Beginning timestep   : 0002-01-01_00:00:00

 NET WATER FLUXES : period  monthly: date =        20101           0
                       Time     |      Time    
                     averaged   |    integrated
                   kg/m2s*1e6   |     kg/m2*1e6
            rain     0.00022450 |             601.29
            snow     0.00000827 |              22.16
            evap    -0.00005797 |            -155.27
          runoff    -0.00004018 |            -107.62
          frzrof     0.00000000 |               0.00
           irrig     0.00000000 |               0.00
           *SUM*     0.00013462 |             360.56

 WATER STATES (kg/m2*1e6): period  monthly: date =        20101           0
                          Canopy            Snow              SFC             Soil Liq          Soil Ice           Aquifer      Grid-level Err |           TOTAL   
             beg              0.00              0.00              0.00           6154.79              0.00          30354.03                   |           36508.83
             end              0.00              0.00              0.00           6553.31              1.83          30314.24                   |           36869.38
    *NET CHANGE*             -0.00              0.00              0.00            398.52              1.83            -39.79              0.00 |             360.56
       *SUM*                                                                      360.56                                                       |             360.56

 NET WATER FLUXES : period   annual: date =        20101           0
                       Time     |      Time    
                     averaged   |    integrated
                   kg/m2s*1e6   |     kg/m2*1e6
            rain     0.00021904 |            6907.72
            snow     0.00000665 |             209.68
            evap    -0.00018108 |           -5710.58
          runoff    -0.00004454 |           -1404.64
          frzrof     0.00000000 |               0.00
           irrig     0.00000000 |               0.00
           *SUM*     0.00000007 |               2.18

 WATER STATES (kg/m2*1e6): period   annual: date =        20101           0
                          Canopy            Snow              SFC             Soil Liq          Soil Ice           Aquifer      Grid-level Err |           TOTAL   
             beg              0.00              0.00              0.00           6451.41             65.38          30350.41                   |           36867.20
             end              0.00              0.00              0.00           6553.31              1.83          30314.24                   |           36869.38
    *NET CHANGE*              0.00              0.00              0.00            101.90            -63.55            -36.17              0.00 |               2.18
       *SUM*                                                                        2.18                                                       |               2.18

 NET WATER FLUXES : period all_time: date =        20101           0
                       Time     |      Time    
                     averaged   |    integrated
                   kg/m2s*1e6   |     kg/m2*1e6
            rain     0.00022395 |         1419577.68
            snow     0.00000652 |           41304.49
            evap    -0.00016411 |        -1040248.45
          runoff    -0.00006491 |         -411461.80
          frzrof    -0.00000000 |              -0.00
           irrig     0.00000000 |               0.00
           *SUM*     0.00000145 |            9171.92

 WATER STATES (kg/m2*1e6): period all_time: date =        20101           0
                          Canopy            Snow              SFC             Soil Liq          Soil Ice           Aquifer      Grid-level Err |           TOTAL   
             beg              0.00              0.00              0.00           6451.41             65.38          30350.41                   |           36867.20
             end              0.00              0.00              0.00           6553.31              1.83          30314.24                   |           36869.38
    *NET CHANGE*              0.00              0.00              0.00            101.90            -63.55            -36.17              0.00 |               2.18
       *SUM*                                                                        2.18                                                       |               2.18

 NET CARBON FLUXES : period  monthly: date =        20101           0
                                                              Time     |      Time    
                                                            averaged   |    integrated
                                                          kgC/m2/s*1   |    kgC/m2*1e6
                                  gross primary product     0.09185151 |          246015.08
                                  ecosystem respiration    -0.15389704 |         -412197.83
                                            fire C loss     0.00000000 |               0.00
                       excess MR pool harvest mortality     0.00000000 |               0.00
            decomposition loss from 1-year product pool     0.00000000 |               0.00
           decomposition loss from 10-year product pool     0.00000000 |               0.00
          decomposition loss from 100-year product pool     0.00000000 |               0.00
                     SOM C loss from vertical transport     0.00000000 |               0.00
                                             SOM C loss     0.00000000 |               0.00
              flux to atmosphere due to dynamic weights     0.00000000 |               0.00
             seed source to leaf due to dynamic weights     0.00000000 |               0.00
       seed source to dead steam due to dynamic weights     0.00000000 |               0.00
                                                  *SUM*    -0.06204553 |         -166182.76

 CARBON STATES (kgC/m2*1e6): period  monthly: date =        20101           0

                                        beg              end          |        *NET CHANGE*
                    PFT          96930511.97         96650672.12      |         -279839.85
                    CWD          30474447.56         30541655.31      |           67207.75
           Total litter           2869482.44          2904010.67      |           34528.23
              Total SOM         149697822.94        149709744.05      |           11921.11
     Total wood product                 0.00                0.00      |               0.00
        Truncation sink                 5.78                5.78      |               0.00
      Crop seed deficit                 0.00                0.00      |               0.00
               Grid-level Err           0.00               -0.00      |              -0.00
           *SUM*                         -166182.76                   |         -166182.76

 NET CARBON FLUXES : period   annual: date =        20101           0
                                                              Time     |      Time    
                                                            averaged   |    integrated
                                                          kgC/m2/s*1   |    kgC/m2*1e6
                                  gross primary product     0.28601945 |         9019909.38
                                  ecosystem respiration    -0.28482231 |        -8982156.47
                                            fire C loss     0.00000000 |               0.00
                       excess MR pool harvest mortality     0.00000000 |               0.00
            decomposition loss from 1-year product pool     0.00000000 |               0.00
           decomposition loss from 10-year product pool     0.00000000 |               0.00
          decomposition loss from 100-year product pool     0.00000000 |               0.00
                     SOM C loss from vertical transport     0.00000000 |               0.00
                                             SOM C loss     0.00000000 |               0.00
              flux to atmosphere due to dynamic weights     0.00000000 |               0.00
             seed source to leaf due to dynamic weights     0.00000000 |               0.00
       seed source to dead steam due to dynamic weights     0.00000000 |               0.00
                                                  *SUM*     0.00119714 |           37752.91

 CARBON STATES (kgC/m2*1e6): period   annual: date =        20101           0

                                        beg              end          |        *NET CHANGE*
                    PFT          96777414.30         96650672.12      |         -126742.18
                    CWD          30386977.40         30541655.31      |          154677.91
           Total litter           2917318.53          2904010.67      |          -13307.86
              Total SOM         149686619.02        149709744.05      |           23125.03
     Total wood product                 0.00                0.00      |               0.00
        Truncation sink                 5.78                5.78      |               0.00
      Crop seed deficit                 0.00                0.00      |               0.00
               Grid-level Err           0.00               -0.00      |              -0.00
           *SUM*                           37752.91                   |           37752.91

 NET CARBON FLUXES : period all_time: date =        20101           0
                                                              Time     |      Time    
                                                            averaged   |    integrated
                                                          kgC/m2/s*1   |    kgC/m2*1e6
                                  gross primary product     0.28186317 |      1786656212.03
                                  ecosystem respiration    -0.27711795 |     -1756577541.53
                                            fire C loss    -0.00007609 |         -482309.73
                       excess MR pool harvest mortality     0.00000000 |               0.00
            decomposition loss from 1-year product pool     0.00000000 |               0.00
           decomposition loss from 10-year product pool     0.00000000 |               0.00
          decomposition loss from 100-year product pool     0.00000000 |               0.00
                     SOM C loss from vertical transport     0.00000000 |               0.00
                                             SOM C loss     0.00000000 |               0.00
              flux to atmosphere due to dynamic weights     0.00000000 |               0.00
             seed source to leaf due to dynamic weights     0.00000000 |               0.00
       seed source to dead steam due to dynamic weights     0.00000000 |               0.00
                                                  *SUM*     0.00466913 |        29596360.77

 CARBON STATES (kgC/m2*1e6): period all_time: date =        20101           0

                                        beg              end          |        *NET CHANGE*
                    PFT          96777414.30         96650672.12      |         -126742.18
                    CWD          30386977.40         30541655.31      |          154677.91
           Total litter           2917318.53          2904010.67      |          -13307.86
              Total SOM         149686619.02        149709744.05      |           23125.03
     Total wood product                 0.00                0.00      |               0.00
        Truncation sink                 5.78                5.78      |               0.00
      Crop seed deficit                 0.00                0.00      |               0.00
               Grid-level Err           0.00               -0.00      |              -0.00
           *SUM*                           37752.91                   |           37752.91
 time integrated flux =    29596360.773535788     
 net change in state  =    37752.906787120817     
 error                =    29558607.866748668     
 ENDRUN:ERROR in /E3SM/components/elm/src/biogeochem/CNPBudgetMod.F90 at line 916                                                                                                                                                                                                                                                                                                                                                                                                                                                       
 ERROR: Unknown error submitted to shr_abort_abort.
serbinsh commented 2 years ago

Another update - I just implemented this change to my local version of the ELM v2.0.0 source code in the file:


  37 use elm_instMod
  38 use WaterBudgetMod         , only : WaterBudget_Reset
  39 use CNPBudgetMod           , only : CNPBudget_Reset
  40 use elm_varctl             , only : do_budgets

Added use CNPBudgetMod , only : CNPBudget_Reset on line 39


732   ! Prevent situation on restart where states get reset at nstep=1 but cumulative fluxes never get reset
733   ! temporary restart bugfix - needs to be tested!!
734   if (get_nstep() <= 1 .and. do_budgets) then
735       call WaterBudget_Reset('all')
736       call CNPBudget_Reset('all')
737    endif

On lines 732-737 just after

    ! ------------------------------------------------------------------------
    ! Read restart/initial info 
    ! ------------------------------------------------------------------------

    if (nsrest == nsrStartup) then

and before

    ! ------------------------------------------------------------------------
    ! If appropriate, create interpolated initial conditions
    ! ------------------------------------------------------------------------

    if (nsrest == nsrStartup .and. finidat_interp_source /= ' ') then

I then setup a spin up and continued spin up run with COMPSET I1850GSWELMBGC where I conduct a spinup and then a second startup using the final restart file from the first spinup case. As I was doing before, except this time I did not get a C budget error. So it does seem that @bsulman's code fix would work here? Of course, I haven't tested it very extensively yet, however, so far it seems to be working for me.

Next I am going to try another set of test runs using OLMT and my edited version of ELM v2

rljacob commented 2 years ago

tagging @susburrows

serbinsh commented 2 years ago

An additional update: After making this change I was able to work through a series of @dmricciuto OLMT simulations, from AD spin up, final spin up, and a transient case. Previously with v2.0.0 I was hitting the same C balance error I reported above when restarting after the AD spin up riun. In fact I identified this issue when trying to use OLMT to setup ensemble experiments with ELM v2.

Also CC @fmyuan who I also discussed this with.

serbinsh commented 2 years ago

Except I seem to have run into a new balance error, though this time the mismatch seems much smaller??

 Beginning timestep   : 0116-07-31_15:00:00
 Beginning timestep   : 0116-07-31_16:00:00
 Beginning timestep   : 0116-07-31_17:00:00
 Beginning timestep   : 0116-07-31_18:00:00
 Beginning timestep   : 0116-07-31_19:00:00
 Beginning timestep   : 0116-07-31_20:00:00
 Beginning timestep   : 0116-07-31_21:00:00
 Beginning timestep   : 0116-07-31_22:00:00
 Beginning timestep   : 0116-07-31_23:00:00
 Beginning timestep   : 0116-08-01_00:00:00

 NET WATER FLUXES : period  monthly: date =      1160801           0
                       Time     |      Time    
                     averaged   |    integrated
                   kg/m2s*1e6   |     kg/m2*1e6
            rain     0.00044762 |            1198.92
            snow     0.00000000 |               0.00
            evap    -0.00030546 |            -818.14
          runoff    -0.00009360 |            -250.70
          frzrof     0.00000000 |               0.00
           irrig     0.00000000 |               0.00
           *SUM*     0.00004857 |             130.08

 WATER STATES (kg/m2*1e6): period  monthly: date =      1160801           0
                          Canopy            Snow              SFC             Soil Liq          Soil Ice           Aquifer      Grid-level Err |           TOTAL   
             beg              0.00              0.00              0.00           6794.88              0.00          30350.26                   |           37145.14
             end              0.00              0.00              0.00           6924.96              0.00          30350.26                   |           37275.22
    *NET CHANGE*             -0.00              0.00              0.00            130.08              0.00              0.00              0.00 |             130.08
       *SUM*                                                                      130.08                                                       |             130.08

 NET CARBON FLUXES : period  monthly: date =      1160801           0
                                                              Time     |      Time    
                                                            averaged   |    integrated
                                                          kgC/m2/s*1   |    kgC/m2*1e6
                                  gross primary product     0.44932938 |         1203483.80
                                  ecosystem respiration    -0.42030310 |        -1125739.83
                                            fire C loss     0.00000000 |               0.00
                       excess MR pool harvest mortality     0.00000000 |               0.00
            decomposition loss from 1-year product pool     0.00000000 |               0.00
           decomposition loss from 10-year product pool     0.00000000 |               0.00
          decomposition loss from 100-year product pool     0.00000000 |               0.00
                     SOM C loss from vertical transport     0.00000000 |               0.00
                                             SOM C loss     0.00000000 |               0.00
              flux to atmosphere due to dynamic weights     0.00000000 |               0.00
             seed source to leaf due to dynamic weights     0.00000000 |               0.00
       seed source to dead steam due to dynamic weights     0.00000000 |               0.00
                                                  *SUM*     0.02902628 |           77743.98

 CARBON STATES (kgC/m2*1e6): period  monthly: date =      1160801           0

                                        beg              end          |        *NET CHANGE*
                    PFT          95034424.73         95236168.52      |          201743.78
                    CWD          32337285.73         32266659.24      |          -70626.49
           Total litter           3252239.58          3212373.58      |          -39866.00
              Total SOM         152079679.45        152066172.14      |          -13507.31
     Total wood product                 0.00                0.00      |               0.00
        Truncation sink                 5.78                5.78      |               0.00
      Crop seed deficit                 0.00                0.00      |               0.00
               Grid-level Err           0.00               -0.01      |              -0.01
           *SUM*                           77743.97                   |           77743.97
 time integrated flux =    77743.976805055776     
 net change in state  =    77743.966801085058     
 error                =    1.0003970717662014E-002
 ENDRUN:ERROR in /E3SM/components/elm/src/biogeochem/CNPBudgetMod.F90 at line 916                                                                                                                                                                                                                                                                                                                                                                                                                                                       
 ERROR: Unknown error submitted to shr_abort_abort.

This happened during the second OLMT spinup run

I setup the simple 3 case test run as:

singularity exec -B $e3sm_src:/E3SM/ -B $host_input_data:/inputdata -B $host_output_dir:/output/ -B \
./OLMT/:/tools/OLMT/ $container /bin/sh -c 'cd /tools/OLMT && python site_fullrun.py \
--site US-Dk3 --sitegroup AmeriFlux --caseidprefix US-Dk3_testrun --nyears_ad_spinup 200 \
--nyears_final_spinup 400 --tstep 1 --spinup_vars --cpl_bypass --compiler gnu \
--mpilib openmpi --gswp3 --model_root /E3SM --caseroot /output \
--ccsm_input /inputdata --runroot /output --no_submit --np 1 --machine modex'
peterdschwartz commented 2 years ago

@serbinsh CNPBudget doesn't seem to be initialized at the start of runs the same way the water budgets are, so it may be an initialization issue in general, which also may explain if the mismatch varies a lot even if the submitted jobs are the same (not sure if this is what you meant when you said it varies).

For the fix, some of the budget restart info may be lost by zeroing out after reading some in, which could cause the discrepancy with this fix. I would try putting the fix at Line 514 (for maint-2.0 branch) of elm_initializeMod instead and see how that works.

Also I don't think the call to get_nstep() would be necessary, but change the if(do_budgets) call WaterBudget_Reset('all') to this

if(do_budgets) then 
   call WaterBudget_Reset('all')`
   if(use_cn) then 
      call CNPBudget_Reset('all')
   end if

This is more in alignment with the calls in elm_driver. I can try and reproduce the error with your command as well.

serbinsh commented 2 years ago

@peterdschwartz Thanks for the helpful feedback! This makes sense. In addition, I feel I may also have created issues by using an incorrect branch? I have been working with the release version of v2 not the maint-2.0 branch. Should I instead start from the maint-2.0 branch? I see those versions have more recent updates so perhaps that would be my best bet.

peterdschwartz commented 2 years ago

@serbinsh Sorry for the mixup. There have been at least a couple of fixes related to budgets since the v2 release, so seeing if the issue is still present would be helpful. If you still need to use the v2 release, then we can cherry pick those fixes onto a branch off of v2 as needed.

serbinsh commented 2 years ago

@peterdschwartz sounds good. No, I do not specifically need to use release v2, I was just trying to use a version I could cite and which would not be changing as rapidly as master brach . I will switch to mV2 and give that a try. I will let you know what I find out.

Again thank you very much for the help and feedback.

serbinsh commented 2 years ago

OK, I am built some test cases with the maint-v2. Again, no issues with initial spin-up, however I am still running into a second spin-up year-2 budget error:

This happened both with and without running Dan's adjust restart code


commit ca0a6948188d49a919c9ec9b2e085cabed6e4b13 (HEAD -> fasst_branch)
Author: Shawn P. Serbin <sserbin@bnl.gov>
Date:   Fri May 20 08:27:43 2022 -0400

    Removed old modex machine defs

commit 701966697d9b764839bcc454ea5420b6d0f12441 (origin/maint-2.0, maint-2.0)
Merge: 2463015404 907ca25c72
Author: Robert Jacob <jacob@anl.gov>
Date:   Tue May 3 18:25:53 2022 -0500

    Merge 'sarats/scripts/run-e3sm-maint2.0' (PR #4930)

    Replace default case name and case group
    This is required to avoid unwarranted usage of case_group that
    interferes with aggregation of various production simulation campaigns.

    Similar to #4922 for maint-2.0 instead of master branch


and get

 Beginning timestep   : 0002-01-01_00:00:00

 NET WATER FLUXES : period  monthly: date =        20101           0
                       Time     |      Time    
                     averaged   |    integrated
                   kg/m2s*1e6   |     kg/m2*1e6
            rain     0.00022450 |             601.29
            snow     0.00000827 |              22.16
            evap    -0.00005790 |            -155.07
          runoff    -0.00004018 |            -107.62
          frzrof     0.00000000 |               0.00
           irrig     0.00000000 |               0.00
           *SUM*     0.00013469 |             360.76

 WATER STATES (kg/m2*1e6): period  monthly: date =        20101           0
                          Canopy            Snow              SFC             Soil Liq          Soil Ice           Aquifer      Grid-level Err |           TOTAL   
             beg              0.00              0.00              0.00           6156.08              0.00          30353.88                   |           36509.95
             end              0.00              0.00              0.00           6556.08              1.80          30312.83                   |           36870.71
    *NET CHANGE*              0.00              0.00              0.00            400.00              1.80            -41.05              0.00 |             360.76
       *SUM*                                                                      360.76                                                       |             360.76

 NET WATER FLUXES : period   annual: date =        20101           0
                       Time     |      Time    
                     averaged   |    integrated
                   kg/m2s*1e6   |     kg/m2*1e6
            rain     0.00021904 |            6907.72
            snow     0.00000665 |             209.68
            evap    -0.00018103 |           -5709.07
          runoff    -0.00004454 |           -1404.71
          frzrof     0.00000000 |               0.00
           irrig     0.00000000 |               0.00
           *SUM*     0.00000011 |               3.62

 WATER STATES (kg/m2*1e6): period   annual: date =        20101           0
                          Canopy            Snow              SFC             Soil Liq          Soil Ice           Aquifer      Grid-level Err |           TOTAL   
             beg              0.00              0.00              0.00           6451.41             65.38          30350.31                   |           36867.09
             end              0.00              0.00              0.00           6556.08              1.80          30312.83                   |           36870.71
    *NET CHANGE*              0.00              0.00              0.00            104.67            -63.57            -37.48              0.00 |               3.62
       *SUM*                                                                        3.62                                                       |               3.62

 NET WATER FLUXES : period all_time: date =        20101           0
                       Time     |      Time    
                     averaged   |    integrated
                   kg/m2s*1e6   |     kg/m2*1e6
            rain     0.00022395 |         1419577.68
            snow     0.00000652 |           41304.49
            evap    -0.00016411 |        -1040246.64
          runoff    -0.00006491 |         -411462.28
          frzrof    -0.00000000 |              -0.00
           irrig     0.00000000 |               0.00
           *SUM*     0.00000145 |            9173.25

 WATER STATES (kg/m2*1e6): period all_time: date =        20101           0
                          Canopy            Snow              SFC             Soil Liq          Soil Ice           Aquifer      Grid-level Err |           TOTAL   
             beg              0.00              0.00              0.00           6451.41             65.38          30350.31                   |           36867.09
             end              0.00              0.00              0.00           6556.08              1.80          30312.83                   |           36870.71
    *NET CHANGE*              0.00              0.00              0.00            104.67            -63.57            -37.48              0.00 |               3.62
       *SUM*                                                                        3.62                                                       |               3.62

 NET CARBON FLUXES : period  monthly: date =        20101           0
                                                              Time     |      Time    
                                                            averaged   |    integrated
                                                          kgC/m2/s*1   |    kgC/m2*1e6
                                  gross primary product     0.09193934 |          246250.34
                                  ecosystem respiration    -0.15565072 |         -416894.89
                                            fire C loss     0.00000000 |               0.00
                       excess MR pool harvest mortality     0.00000000 |               0.00
            decomposition loss from 1-year product pool     0.00000000 |               0.00
           decomposition loss from 10-year product pool     0.00000000 |               0.00
          decomposition loss from 100-year product pool     0.00000000 |               0.00
                     SOM C loss from vertical transport     0.00000000 |               0.00
                                             SOM C loss     0.00000000 |               0.00
              flux to atmosphere due to dynamic weights     0.00000000 |               0.00
             seed source to leaf due to dynamic weights     0.00000000 |               0.00
       seed source to dead steam due to dynamic weights     0.00000000 |               0.00
                                                  *SUM*    -0.06371138 |         -170644.56

 CARBON STATES (kgC/m2*1e6): period  monthly: date =        20101           0

                                        beg              end          |        *NET CHANGE*
                    PFT          95663744.31         95385964.36      |         -277779.94
                    CWD          30817993.17         30879169.84      |           61176.67
           Total litter           2879035.68          2914121.07      |           35085.39
              Total SOM         149030475.72        149041349.05      |           10873.33
     Total wood product                 0.00                0.00      |               0.00
        Truncation sink                 5.78                5.78      |               0.00
      Crop seed deficit                 0.00                0.00      |               0.00
         Grid-level Err                 0.00               -0.00      |              -0.00
           *SUM*                         -170644.56                   |         -170644.56

 NET CARBON FLUXES : period   annual: date =        20101           0
                                                              Time     |      Time    
                                                            averaged   |    integrated
                                                          kgC/m2/s*1   |    kgC/m2*1e6
                                  gross primary product     0.28608304 |         9021914.75
                                  ecosystem respiration    -0.28763452 |        -9070842.29
                                            fire C loss     0.00000000 |               0.00
                       excess MR pool harvest mortality     0.00000000 |               0.00
            decomposition loss from 1-year product pool     0.00000000 |               0.00
           decomposition loss from 10-year product pool     0.00000000 |               0.00
          decomposition loss from 100-year product pool     0.00000000 |               0.00
                     SOM C loss from vertical transport     0.00000000 |               0.00
                                             SOM C loss     0.00000000 |               0.00
              flux to atmosphere due to dynamic weights     0.00000000 |               0.00
             seed source to leaf due to dynamic weights     0.00000000 |               0.00
       seed source to dead steam due to dynamic weights     0.00000000 |               0.00
                                                  *SUM*    -0.00155148 |          -48927.54

 CARBON STATES (kgC/m2*1e6): period   annual: date =        20101           0

                                        beg              end          |        *NET CHANGE*
                    PFT          95489502.85         95385964.36      |         -103538.48
                    CWD          30825320.47         30879169.84      |           53849.37
           Total litter           2917330.15          2914121.07      |           -3209.08
              Total SOM         149037378.39        149041349.05      |            3970.66
     Total wood product                 0.00                0.00      |               0.00
        Truncation sink                 5.78                5.78      |               0.00
      Crop seed deficit                 0.00                0.00      |               0.00
         Grid-level Err                 0.00               -0.00      |              -0.00
           *SUM*                          -48927.54                   |          -48927.54

 NET CARBON FLUXES : period all_time: date =        20101           0
                                                              Time     |      Time    
                                                            averaged   |    integrated
                                                          kgC/m2/s*1   |    kgC/m2*1e6
                                  gross primary product     0.28186364 |      1786659224.47
                                  ecosystem respiration    -0.27713209 |     -1756667169.00
                                            fire C loss    -0.00007609 |         -482286.95
                       excess MR pool harvest mortality     0.00000000 |               0.00
            decomposition loss from 1-year product pool     0.00000000 |               0.00
           decomposition loss from 10-year product pool     0.00000000 |               0.00
          decomposition loss from 100-year product pool     0.00000000 |               0.00
                     SOM C loss from vertical transport     0.00000000 |               0.00
                                             SOM C loss     0.00000000 |               0.00
              flux to atmosphere due to dynamic weights     0.00000000 |               0.00
             seed source to leaf due to dynamic weights     0.00000000 |               0.00
       seed source to dead steam due to dynamic weights     0.00000000 |               0.00
                                                  *SUM*     0.00465547 |        29509768.52

 CARBON STATES (kgC/m2*1e6): period all_time: date =        20101           0

                                        beg              end          |        *NET CHANGE*
                    PFT          95489502.85         95385964.36      |         -103538.48
                    CWD          30825320.47         30879169.84      |           53849.37
           Total litter           2917330.15          2914121.07      |           -3209.08
              Total SOM         149037378.39        149041349.05      |            3970.66
     Total wood product                 0.00                0.00      |               0.00
        Truncation sink                 5.78                5.78      |               0.00
      Crop seed deficit                 0.00                0.00      |               0.00
         Grid-level Err                 0.00               -0.00      |              -0.00
           *SUM*                          -48927.54                   |          -48927.54
 time integrated flux =    29509768.518643588     
 net change in state  =   -48927.537501010462     
 current state        =    3.4962232990375397     
 relative error [%]   =    10.624193529657088     
 ENDRUN:ERROR in /E3SM/components/elm/src/biogeochem/CNPBudgetMod.F90 at line 929                                                                                                                                                                                                                                                                                                                                                                                                                                                       
 ERROR: Unknown error submitted to shr_abort_abort.
serbinsh commented 2 years ago

I am trying to put in your suggested code changes but so far it doesnt seem to be making a difference, perhaps not even using the additional condition....

peterdschwartz commented 2 years ago

Thanks for testing that out. I will have to reproduce on a machine I have access to using the command you posted earlier and look into further.

serbinsh commented 2 years ago

Ok, thanks. I will also keep experimenting

The fundamental cases I am trying to run are: An initial spinup ICB1850CNRDCTCBC; then a second ICB1850CNRDCTCBC spin up using the output of the first (this is where I hit a C budget error), and after that I would run ICB20TRCNPRDCTCBC

I am setting up cases with OLMT. However, I am having the same issues with basic shell scripts setting up and running very similar cases to those that are built by OLMT

serbinsh commented 2 years ago

I should also mention I get errors when adjusting the restarts or not, but adjusting I do see this error

[sserbin@modex FASSt-simulation]$ singularity exec -B $e3sm_src:/E3SM/ -B /data/Model_Data/cesm_input_datasets:/inputdata \
> -B /data2/Model_Output/olmt_test/:/output \
> containers/elm-builds:release-v2.0.0-latest.sif /bin/sh -c 'cd /tools/OLMT/ && \
> python adjust_restart.py --rundir /output/US-Dk3_testrun_US-Dk3_ICB1850CNRDCTCBC_ad_spinup/run \
> --casename US-Dk3_testrun_US-Dk3_ICB1850CNRDCTCBC_ad_spinup --restart_year 201 --model_name elm'
adjust_restart.py:96: UserWarning: Warning: converting a masked element to nan.
  if (float(rest_vals[i]) > 0.0 and float(hist_vals[0][i]) < 1.0e10 and float(hist_vals[0][i]) > 0.001):
[sserbin@modex FASSt-simulation]$
rljacob commented 2 years ago

What do you mean by "a second spinup with the output of the first" ? How long are each of these runs?

serbinsh commented 2 years ago

I'm following the protocol of Dan's in his OLMT tool. I have actually tried a number of different versions of ELM at this point, including v1.2, v2, v2 maint, as well as now just trying a new branch recommended by Fengming: https://github.com/fmyuan/E3SM/tree/elm_v2-for-ngee

My initial testing with this version suggests I am not running into the restart issues.

The protocol is: 200 initial spinup, 400 final spinup, then transient. That said I did try my longer initial spin ups but still ran into the error.

Would you suggest a modification to the protocol? Again thanks and I hope I am not creating my own issues with this!

serbinsh commented 2 years ago

Confirmed that at least for a single ensemble (default param) run at US-Dk2 with OLMT and using E3SM https://github.com/fmyuan/E3SM/tree/elm_v2-for-ngee version of ELM, I was able to get through all three cases and get transient run output, e.g.

Screen Shot 2022-05-23 at 12 16 29 PM

FYI @fmyuan

bpbond commented 2 years ago

@BunnyVon Could this be related to #4971 ?

BunnyVon commented 2 years ago

I don't think so. The larger relative error in carbon mass shown in #4971 has only appeared once in the first 30 years (where the run is at right now). However, is this issue related to ELM used in master or NGD? I use the latest master for BGC v2 simulations so wonder if it's a concern for me as well.

susburrows commented 2 years ago

Hi all, sorry for the slow follow-up. I was on travel last week, and this week my laptop died, which made things a bit chaotic for me.

It looks like this issue has been resolved for now, is that correct @serbinsh ?

Just a couple of quick thoughts/questions -- although much of this may be moot if the issue has been fully resolved: 1) I'm not sure if this is technically an E3SM-supported compset (it's not one of the coupled compsets that the BGC group was responsible for, but I'm less familiar with the land compsets). @rljacob may be able to clarify. 2) @bishtgautam worked on the ELM budget diagnostics and should be able to help check if it is an issue with the diagnostics themselves, or with the code, if needed. 3) there should be a v2.1 release soon-ish, which would become citable for publication purposes. In light of this, you might consider starting from the latest master instead of maint-2.0.0. I'm not certain if the maintenance branch has all the bug fixes that have been added on master since the 2.0 tag, or not, although again, Rob may be able to comment further. 4) @serbinsh as a general comment for future issue reports, reading through the github thread I was often having trouble identifying, at different stages in the thread, (1) which git hash you are referring to, (2) what machine you ran on, and (3) how another person would independently reproduce a particular error message (e.g., the command or runscript to reproduce it out-of-the-box). Providing this information will make it easier for others to assist, and it also makes the github thread more useful to others who may run into similar issues in the future, and find this thread. Apologies if I'm telling you something you already know!

rljacob commented 2 years ago

For external users, we only support the production compsets. I-cases are used internally for development but I1850GSWELMBGC isn't in our test suite so must not be considered important.

bishtgautam commented 2 years ago

@serbinsh Can you please try out bishtgautam/lnd/fix-budgets branch to see if it fixes this issue?