Closed sharma-bharat closed 1 year ago
@sharma-bharat the trigger for this error is due to the fact that the error
is NaN
which is one of the checks that generates this endrun
call. The source of the NaN
is the litter_stock
which is determined by a call to PatchMassStock
. This variable is just a summation of the current patch litter times the patch area. I'm guessing there must be some issue in the litter initialization. @rgknox any initial thoughts?
Can you provide some more details about the case you are running and the particular versions of fates?
The error triggers immediately after the event, which suggests to me that the problem is directly tied to the event itself.
Here is the call to the balance check, which the log indicated, was call index 3: https://github.com/NGEET/fates/blob/sci.1.61.0_api.25.0.0/main/EDMainMod.F90#L284
Note that the check fails right after spawn_patches. So there must be some problem with the litter (also noting what @glemieux said) being generated from the disturbance event.
Here is the call to litter generation from a logging event: https://github.com/NGEET/fates/blob/sci.1.61.0_api.25.0.0/biogeochem/EDPatchDynamicsMod.F90#L608-L609
I would put print statements in that routine (or run a debugger), wherever we update the new_litter% and cur_litter% terms. The print statement will very often fail if trying to print a nan or inf, so keep that in mind too. Also, updating the xml file in your case director to debug mode might help:
./xmlchange DEBUG=True
Here is the first place that the litter gets updated: https://github.com/NGEET/fates/blob/sci.1.61.0_api.25.0.0/biogeochem/EDLoggingMortalityMod.F90#L607-L610
I'm not sure if this is relevant but logging events will throw a mass balance error with values of fates_mort_disturb_frac
other than 1.0. @jkshuman and I identified this some time ago. Unfortunately, I never completed an issue for this problem as promised in issue #714. I could look back though my notes for more details if you think this could be related to your issue.
@JoshuaRady thanks, that's an interesting point. In principle the fates_mort_disturb_frac
parameter probably shouldn't apply to logging. FATES should probably assume that all logged area is disturbed, just as it also assumes that all burned area is disturbed.
A comment in response to immediately above, I don't think fates_mort_disturb_frac
is applied to logging, see here. So perhaps someone has changed that in the interim. In any case it was worth looking into and we get logging event code crashes with fates_mort_disturb_frac
ranging from 0 to 1 (for certain combinations of logging related parameters).
@sharma-bharat has been digging into the logging code and logging event failures that cause the code to crash. Our heads are in knots a little but I think we're getting an idea of what's going on.
The proximal issue is that with a single or annual logging event disturbance_rate
here is 1. That means the new patch area is equal to the donor patch area. So the old patch area (remainder_area
) becomes zero and when partitioning litter fluxes you get a divide by zero giving a NaN and throwing the error that crashes the model.
This can be fixed by writing a catch into the litter partitioning logic that zeros litter fluxes when remainder_area == 0
rather than calculating them.
I think that's worth doing for robustness but it doesn't answer why we get disturbance_rate == 1.0
when our logging parameter disturbance fractions sum < 1. I think we've traced it to the interface with the hlm/luh harvesting code. If the HLM input data requires logging I think it passes a switch and a harvest_rate
parameter that defines the harvested/human disturbed area fraction. We then calculate the direct, collateral, mechanical logged disturbance by multiplying lmort_direct
, lmort_collateral
, lmort_infra
each by harvest_rate
and then to be sure the human disturbed area from the HLM is maintained we calculate the degraded fraction as l_degrad = harvest_rate - (lmort_direct + lmort_infra + lmort_collateral)
.
This is all good but when we initiate an annual logging event from the parameter file, we set harvest rate to 1. Combined with the above disturbance fraction logic, which remains the same, this means that when we initiate an annual logging event all of our area is disturbed.
I don't think that's the logic we want right? I'm not sure what degradation disturbance is supposed to represent in the case where we initiate a logging event from the param file. Seems like collateral and mechanical should cover that?
I think it's a fairly easy fix, I just don't know exactly what logic we want. The fix can go here and look something like:
if (canopy_layer .eq. 1 .and. hlm_use_lu_harvest == itrue) then
l_degrad = harvest_rate - (lmort_direct + lmort_infra + lmort_collateral) ! fraction passed to 'degraded' forest.
else
l_degrad = 0._r8
endif
Or add an additional case to represent the degraded disturbance logic for a logging event initiated from the parameter file.
TL;DR: annual logging event from the param file causes all of a patch to be disturbed, resulting in a divide by zero in the new patch generation logic. Can fix this by 1) changing the harvest_rate / degraded land logic for a parameter file initiated logging event and 2) making the new patch creation logic robust to a zero remainder area (poss. only during a logging event).
@ckoven @rgknox @jkshuman @glemieux @JoshuaRady -- let us know what you think and we'll implement the fixes. At the least fix 1) poss 2).
I don't think that's the logic we want right? I'm not sure what degradation disturbance is supposed to represent in the case where we initiate a logging event from the param file. Seems like collateral and mechanical should cover that?
@walkeranthonyp I think the key thing is to distinguish what is meant by each of the terms here.
harvest_rate
is meant to be the total area (over the timestep) that the harvest treatment is applied over,
lmort_direct
is the fraction of trees within the area harvested that are harvested,
likewise for lmort_infra
and lmort_collateral
.
l_degrad
is meant to be the remaining part of harvest_rate
where the plants aren't killed for any reason during the harvest event, but the land becomes considered as degraded forest. Another way of thinking about this is that a logging treatment shouldn't result in any change in the overall composition of the patches that remain unharvested, their area should just decrease to be 1- harvest_rate
times their prior area. The separation of patch area lumped into the l_degrad
term is what allows this even in the case of a selective harvest, otherwise all the unharvested trees within the harvest area would still be left behind in the unharvested patches that remain and so the composition of the unharvested patches would change, in addition to their area.
In the case of the event-based logging, since the assumption is that it is applied to the entire area, I can see how the l_degrad
logic may be unnecessary: all of the land should then be considered degraded forest because it was all within the harvest treatment area. So I think I support what you are proposing, it should revert the behavior back to what it was in https://bg.copernicus.org/articles/17/4999/2020/ .
@walkeranthonyp I'm still trying to think this through some more in real time. The one concern that occurs to me is the subsequent dynamics due to patch labeling (which wasn't yet a thing in FATES as of https://bg.copernicus.org/articles/17/4999/2020/ ). If you set l_degrad
to zero when the event-based logging is enabled, whatever land is not disturbed by the logging (either because it is underneath the crowns of trees that were not killed by direct + infra + collateral, or because the canopy is not closed by woody plants and so it was ground not occupied by any trees to begin with) will still have a patch label of primaryforest
, whereas what is logged will be secondaryforest
. So then that would mean those areas cannot subsequently be fused. So we may need to come up with an alternate solution that doesn't do that.
For what we want to do with FACE simulations we actually want to clear the whole area, so it would all be labeled secondaryforest
.
But for the more broad case where not the whole area is disturbed by logging - would what you suggest be a problem? Would it get messy with successive logging events, the code wouldn't know where to apply them or something.
@walkeranthonyp so long as your canopy is completely closed, I think it should work fine. And yeah, maybe it shouldn't matter in the general case what the labels are at the site scale anyway, they really only get used in the larger-scale sense through the separate drivers for subsequent logging in order to allow large-scale forest rotations. Maybe give it a try and just make sure nothing weird happens?
As @ckoven confirmed above the addition of l_degrad
a few years back means that any logging event from the parameter file will result in 100% of the patch being disturbed, regardless of logging settings. The logging removals are applied and everything that remains is placed into a new patch (1 old patch -> 1 new patch).
If you want to split the patch with logging, the change you propose should allow this but I would proceed with caution. In addition to the primary vs. secondary issue mentioned by @ckoven, there may be other issues. I reviewed the code at some point in the past and I recall I had some concerns that the mass partitioning code might not work as expected if the patch is split. Sorry to be vague but I haven't found my notes on that. I just know I had to do a lot of careful debugging when I implemented patch splitting in the VM Module.
Additionally, when a patch is split after a harvest the canopy can undergo rearrangement that leads to unpleasant and unrealistic side effects (again see issue #714).
Sorry, my comment above was too slow and @walkeranthonyp's subsequent post renders part it moot.
@walkeranthonyp, I think for what you want the current behavior is what you want. If you change l_degrad
= 0 I think you risk splitting the patch if trees don't occupy 100% of the area, which is basically what @ckoven said in reverse.
Clearly the litter divide by zero issue should be fixed.
Thanks for the rapid comments @ckoven @JoshuaRady. I'm somewhat agnostic to what we do with harvest_rate
. My goal is to get my head around how this code is supposed to work, especially in relation to logging events specified in the parameter file, and try to avoid messing something up.
From your comments and reading of the code, logging enacted from the param file is supposed to affect 100% of the area so that all of a logged patch becomes a new patch. But damage to plants only happens through lmort_direct
, lmort_collateral
, lmort_infra
. All l_degrad
does is to bring the part of the patch not damaged by logging into the new patch. Is that accurate (or close enough)? If so I have two questions:
l_degrad
is high? That seems inconsistent with the intention of the land use dataset which I guess assumes that all of the area defined by harvest_rate
is damaged in some way - or is l_degrad
always low in these simulations and/or lmort_collateral
, lmort_infra
and lmort_direct
are controlled by the HLM / land use dataset? I have a fix for this that's working by catching when remainder_area
is equal to zero in logging_litter_fluxes
. Looking at the code now, I see there are similar catches for fire_litter_fluxes
and mortality_litter_fluxes
. I will align my edits with those and make a PR.
My Q's above still stand. Thinking more on this, by setting harvest_rate = 1
the current logging event behaviour is functionally equivalent to setting fates_mort_disturb_frac = 0
. Is that the behaviour we're going for? Wouldn't that mean canopy gaps caused by logging could close v rapidly due to gap filling done according to PPA rather than gap-phase regrowth?
My Q's above still stand. Thinking more on this, by setting
harvest_rate = 1
the current logging event behaviour is functionally equivalent to settingfates_mort_disturb_frac = 0
. Is that the behaviour we're going for? Wouldn't that mean canopy gaps caused by logging could close v rapidly due to gap filling done according to PPA rather than gap-phase regrowth?
This discussion will continue in #1003.
Hi all, I’m on a recent commit of main branch (7f448b09 sci.1.65.2_api.25.4.0) that includes the fix from PR #996 but I’m seeing (nearly) the same error as described above - NaN in mass balance checks following event based logging.
I’m trying to simulate a one time stand clearing disturbance (all trees in all patches dead) and total logging seemed like the easiest way to go.
My fail has call index 4, rather than 3, so it is being triggered here after patch fusion (the TotalBalanceCheck after spawn_patches seems to pass ok).
Looks like a patch with very small negative area was somehow created?
Logging Event Enacted on date: 01-01-1993
See luc mortalities: 1.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000
See luc mortalities: 1.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000
mass balance error detected
element type (see PRTGenericMod.F90): 1
error fraction relative to biomass stock: 0.0000000000000000
absolut error (flux in - change): NaN
call index: 4
Element index (PARTEH global): 1
net: -22.672758641509233
dstock: NaN
seed_in: 219.17808219178085
net_root_uptake: 0.0000000000000000
gpp_acc: 98.886479060820022
flux_generic_in: 0.0000000000000000
wood_product: 0.0000000000000000
error from patch resizing: 1.9561048570576513E-010
burn_flux_to_atm: 0.0000000000000000
seed_out: 0.0000000000000000
flux_generic_out: 0.0000000000000000
frag_out: 257.97276723005825
aresp_acc: 82.764552664247461
error=net_flux-dstock: NaN
biomass 33859.880998928049
litter NaN
seeds NaN
total stock NaN
previous total 721358.60047075246
lat lon 9.2500000000000000 280.25000000000000
patch area: -6.2527760746888816E-013
AG CWD: NaN
BG CWD (by layer): NaN NaN NaN NaN \
NaN NaN NaN NaN NaN NaN
leaf litter: NaN
root litter (by layer): NaN NaN NaN NaN \
NaN NaN NaN NaN NaN NaN
anthro_disturbance_label: 1
use_this_pft: 1 1 1 1
Maybe I missed something in the discussion above. Is it possible to allow fates_landuse_logging_direct_frac = 1.0 and fates_landuse_logging_dbhmin = 0.0?
Hi @JessicaNeedham . These options (fates_landuse_logging_direct_frac = 1.0 and fates_landuse_logging_dbhmin = 0.0) work fine in my cases (global fixed_biogeog with no comp
).
Thanks @sshu88 , that's good to know. I only see this error with the logging event on certain dates in full FATES. I guess a certain patch structure prior to logging must be triggering it.
We (Dr. Anthony Walker and I) were trying to perform logging on a specific day (icode > 10000) and we are getting a mass balance error, as follows:
In the log file, it looks like the abs error is a NaN. That would likely be causing the mass balance error.
Could you please look into this issue and help us resolve it?
Thanks, Bharat