NGEET / fates

repository for the Functionally Assembled Terrestrial Ecosystem Simulator (FATES)
Other
99 stars 91 forks source link

Numerical Overflows in carbon accounting #168

Closed rgknox closed 7 years ago

rgknox commented 7 years ago

Summary of Issue:

There is a check on total fates carbon balance that occurs once each day when vegetation dynamics is called. This happens in ChecksBalancesMod.F90: FATES_BGC_Carbon_Balancecheck().

This is the routine that updates the variable: sites(s)%cbal_err_fates, this is the variable that blows up. It is blowing up from the variable: sites(s)%fates_to_bgc_this_ts, which is blowing up from: currentPatch%root_litter_out(ft). By blowing up, it suddenly has a value of E+248.

This only occurs on the first time a new patch has been created.

currentPatch%root_litter_out() is zero'd and set in cwd_out(), via:

 do ft = 1,numpft_ed
       currentPatch%leaf_litter_out(ft) = max(0.0_r8,currentPatch%leaf_litter(ft)* SF_val_max_decomp(dg_sf) * &
            currentPatch%fragmentation_scaler )
       currentPatch%root_litter_out(ft) = max(0.0_r8,currentPatch%root_litter(ft)* SF_val_max_decomp(dg_sf) * &
            currentPatch%fragmentation_scaler )
       if ( currentPatch%leaf_litter_out(ft)<0.0_r8.or.currentPatch%root_litter_out(ft)<0.0_r8)then
         write(iulog,*) 'root or leaf out is negative?',SF_val_max_decomp(dg_sf),currentPatch%fragmentation_scaler
       endif
    enddo

The order of operations seems fine. cwd_out is called from vegetation_dynamics, and the balance checks are called from EDBGCDynSummary() which is immediately after vegetation_dynamics in clm_drv().

ed_ecosystem_dynamics -> ed_integrate_state_variables -> non_canopy_derivs() -> cwd_out() EDBGCDynSummary() -> wrap_bgc_summary() -> SummarizeNetFluxes() EDBGCDynSummary() -> wrap_bgc_summary() ->FATES_BGC_Carbon_BalanceCheck()

Expected behavior and actual behavior:

Steps to reproduce the problem (should include create_newcase or create_test command along with any user_nl or xml changes):

What is the changeset ID of the code, and the machine you are using:

have you modified the code? If so, it must be committed and available for testing:

Screen output or output files showing the error message and context:

ckoven commented 7 years ago

does it work to just zero the root_litter_out at the new patch creation step here:? https://github.com/NGEET/ed-clm/blob/master/components/clm/src/ED/biogeochem/EDPatchDynamicsMod.F90#L868

rgknox commented 7 years ago

That seems like a good idea, I will look into that too. I've also tracked down some NaN's being generated in patch%root_litter during cohort termination. Although its not clear yet why the NaN's are occurring, as components used to calculate the value are all reasonable numbers at the time of failure seem very reasonable.

At the end of EDCohortDynamicsMod.F90:terminate_cohorts(), I added a nan catcher for root_litter:

             currentPatch%root_litter(currentCohort%pft) = currentPatch%root_litter(currentCohort%pft) + currentCohort%n* &
                  (currentCohort%br+currentCohort%bstore)/currentPatch%area 

             if(currentPatch%root_litter(currentCohort%pft).ne.currentPatch%root_litter(currentCohort%pft)) then
                print*,'ROOT LITTER PROB',currentCohort%n,currentCohort%br,currentCohort%bstore,currentPatch%area 
                stop
             end if

And it does catch a NaN, but the output is not unusual:

ROOT LITTER PROB   1.5980184922696298E-006   5.3619791069715643E-002  0.10930903672984571       0.36255274778408553   

UPDATE: This was a red herring, I tracked the NaN back to earlier in the code to subroutine mortality_litter_fluxes(), variable currentpatch%canopy_mortality_root_litter(p)

rgknox commented 7 years ago

The problem was traced all the way back to cohort%npp_acc. This value was not being copied correctly during cohort copying, which is used in various places including spawning newly disturbed patches.
PR with fix coming soon.