ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
308 stars 311 forks source link

problems with new CNFireMod #12

Open billsacks opened 6 years ago

billsacks commented 6 years ago

Bill Sacks < sacks@ucar.edu > - 2013-05-01 15:24:28 -0600 Bugzilla Id: 1679 Bugzilla CC: andre@ucar.edu, bugzillaMuszala@gmail.com, dlawren@ucar.edu, muszala@ucar.edu, mvertens@ucar.edu, rfisher@ucar.edu, slevis@ucar.edu,

This started with a search through the CNFireMod code for potentially-incorrect uses of weights, now that I have introduced 'active' flags. But in looking for this, I noticed a number of questionable things in this code, so decided to review the entire module. Here are some problems I found.

Some of these are cosmetic (e.g., variable names); others are not bugs per se, but make the code fragile to future changes; and others look like bugs (and if they aren't bugs, then they need some explanation about why they're correct even though they look incorrect).

(1) Many variables are declared as "! local pointers to implicit in scalars", when in fact they are implicit out or in/out

(2) There are a number of cryptic variable names, some with no description. For example:

real(r8) :: lh ! real(r8) :: fs ! real(r8) :: ig !

(3) There are a number of places where the code checks whether the current pft is crop or natural veg. This is done like this:

       ! For crop veg types
       if( ivt(p) > nc4_grass )then

       ! For natural vegetation (non-crop)
       if( ivt(p) >= ndllf_evr_tmp_tree .and. ivt(p) <= nc4_grass )then

This is very fragile code, which will break if anyone adds new natural PFTs to the beginning or end of the list. A possible replacement for crop is - I think - to use npcropmin & npcropmax (defined in pftvarcon.F90) - though I'm not sure whether that does the right thing when running without the specific crop types?

(4) Similarly, checks of the type of pft:

                   if( ivt(p) .ge. nbrdlf_evr_shrub )then      !for shurb and grass

... else ! for trees

This assumes that people adding PFTs in the future will add them in the "right" place, where "right" is determined by the logic embedded here. This should probably be handled by adding constants to the pft physiology file.

The idea of someone wanting to add PFTs is not purely theoretical: Andy Jones just told me today that he is adding some PFTs for some of his work. So I am leery of code that makes this difficult or error-prone.

(5) There are some seemingly ad-hoc conditionals, some of which seem incorrect. In particular, there seem to be terms added to conditionals that don't need to (or shouldn't) be there. For example:

(a) ! For crop PFT's if( ivt(p) > nc4_grass .and. wtcol(p) > 0._r8 .and. leafc_col(c) > 0._r8 )then fuelc_crop(c)=fuelc_crop(c) + (leafc(p) + leafc_storage(p) + & leafc_xfer(p))wtcol(p)/cropf_col(c) + & totlitc(c)leafc(p)/leafc_col(c)*wtcol(p)/cropf_col(c) end if

Why should this only be done if leafc_col(c) > 0? Note that the first parenthesized term can be non-zero even if leafc_col(c) == 0, because leafc_storage or leafc_xfer could be > 0.

Also:

(b) if( .not. shr_infnan_isnan(btran2(p)) .and. btran2(p) .le. 1._r8 )then

Why is this check needed? Since this code is in a loop over filter_soilc points, it shouldn't include points that have NaN values; and I can't see how btran2 could be greater than 1 -- and if it ever were greater than 1, I would question whether it's really correct to exclude points that have values greater than 1: is it really correct to treat points with values of 1.0000000001 (maybe due to rounding error??) fundamentally differently from points with values of 1.0?

(6) More intermediate variables (and/or comments) are needed to make it more clear what this code is doing, so people other than the author can modify it in the future. For example, in the example in #5a, I have no idea what this term is trying to accomplish:

                        totlitc(c)*leafc(p)/leafc_col(c)*wtcol(p)/cropf_col(c)

As another example, this equation is very hard to read:

    baf_peatf(c) = boreal_peatfire_c*exp(-SHR_CONST_PI*(max(wf2(c),0._r8)/0.3_r8))* &
    max(0._r8,min(1._r8,(tsoi17(c)-SHR_CONST_TKFRZ)/10._r8))*peatf_lf(c)* &
    (1._r8-fsat(c))

(7) There are many places where long expressions do two things at once: (1) compute some pft-level variable with a complex expression, and (2) average from pft to column -- all in a single line of code. The example in #5a, above, illustrates this.

The code would be clearer and easier to modify if these expressions were split into two pieces: (1) first compute pft-level variable for all quantities, and then (2) average to column, preferably via a call to p2c.

(6) The use of cwtgcell in this code is suspicious to me:

             if( ivt(p) == nbrdlf_evr_trp_tree .and. wtcol(p) .gt. 0._r8 )then
                trotr1_col(c)=trotr1_col(c)+wtcol(p)*cwtgcell(c)
             end if
             if( ivt(p) == nbrdlf_dcd_trp_tree .and. wtcol(p) .gt. 0._r8 )then
                trotr2_col(c)=trotr2_col(c)+wtcol(p)*cwtgcell(c)
             end if
             if( ivt(p) == nbrdlf_evr_trp_tree .or. ivt(p) == nbrdlf_dcd_trp_tree )then
                if(lfpftd(p).gt.0._r8)then
                   dtrotr_col(c)=dtrotr_col(c)+lfpftd(p)*cwtgcell(c)
                end if
             end if

There appears to be no other science code (i.e., non-infrastructure code) that depends on weights on the grid cell. I believe that all code is intended to compute per-area quantities, so a column's weight on the grid cell should be irrelevant in computing a column-level quantity.

If this code is truly correct, it needs more explanation.

(7) Why does the normalization differ for these two averages from pft to col:

          rootc_col(c) = rootc_col(c) + (frootc(p) + frootc_storage(p) + &
                         frootc_xfer(p) + deadcrootc(p) +                &
                         deadcrootc_storage(p) + deadcrootc_xfer(p) +    &
                         livecrootc(p)+livecrootc_storage(p) +           &
                         livecrootc_xfer(p))*wtcol(p)

          fsr_col(c) = fsr_col(c) + fsr_pft(ivt(p))*wtcol(p)/(1.0_r8-cropf_col(c))

i.e., why does the second normalize by (1-cropf_col), but the first does not?

(8) There are lots of magic numbers. For example:

    fire_m   = exp(-SHR_CONST_PI *(m/0.69_r8)**2)*(1.0_r8 - max(0._r8, &
               min(1._r8,(forc_rh(g)-30._r8)/(70._r8-30._r8))))*  &
               min(1._r8,exp(SHR_CONST_PI*(forc_t(g)-SHR_CONST_TKFRZ)/10._r8))
    lh       = 0.0035_r8*6.8_r8*hdmlf**(0.43_r8)/30._r8/24._r8
    fs       = 1._r8-(0.01_r8+0.98_r8*exp(-0.025_r8*hdmlf))
    ig       = (lh+forc_lnfm(g)*0.25_r8)*(1._r8-fs)*(1._r8-cropf_col(c)) 

(9) In this code and some following code, normalizations are sometimes done by (1 - cropf_col), and sometimes by lfwt:

                   if( ivt(p) .ge. nbrdlf_evr_shrub )then      !for shurb and grass
                      lgdp_col(c)  = lgdp_col(c) + (0.1_r8 + 0.9_r8*    &
                                     exp(-1._r8*SHR_CONST_PI* &
                                     (gdp_lf(c)/8._r8)**0.5_r8))*wtcol(p) &
                                     /(1.0_r8 - cropf_col(c))
                      lgdp1_col(c) = lgdp1_col(c) + (0.2_r8 + 0.8_r8*   &
                                     exp(-1._r8*SHR_CONST_PI* &
                                     (gdp_lf(c)/7._r8)))*wtcol(p)/lfwt(c)
                      lpop_col(c)  = lpop_col(c) + (0.2_r8 + 0.8_r8*    &
                                     exp(-1._r8*SHR_CONST_PI* &
                                     (hdmlf/450._r8)**0.5_r8))*wtcol(p)/lfwt(c)

Is lfwt = (1 - cropf_col)?? If so, these normalizations should be made consistent. If not, why do they differ, and why are some normalizations done by one factor and others by another?

(10) The computations of lgdp_col, lgdp1_col and lpop_col are hard to decipher, and could use some explanatory comments and/or code cleanup (e.g., use of intermediate variables) to make it more clear what's going on here.

(11) The following code needs some explanation:

if (fpftdyn /= ' ') then !true when landuse data is used do fc = 1,num_soilc c = filter_soilc(fc) if( dtrotr_col(c) .gt. 0._r8 )then if( kmo == 1 .and. kda == 1 .and. mcsec == 0)then lfc(c) = 0._r8 end if if( kmo == 1 .and. kda == 1 .and. mcsec == dt)then lfc(c) = dtrotr_col(c)dayspyrsecspday/dt end if else lfc(c)=0._r8 end if end do end if

It looks like this is trying to re-initialize lfc at the beginning of each year. However, I'm concerned about whether this will behave correctly if drotr_col(c) == 0 on the first and/or second timesteps of the year. In this case the variables won't be initialized for that year. Maybe that's okay - it's just not clear to me if that's what's intended.

(12) It is odd to me that deforestation fires cannot happen on Jan 1 at midnight:

    !
    ! if landuse change data is used, calculate deforestation fires and 
    ! add it in the total of burned area fraction
    !
    if (fpftdyn /= ' ') then    !true when landuse change data is used
       if( trotr1_col(c)+trotr2_col(c) > 0.6_r8 )then
          if(( kmo == 1 .and. kda == 1 .and. mcsec == 0) .or. &
               dtrotr_col(c) <=0._r8 )then
             fbac1(c)        = 0._r8
             farea_burned(c) = baf_crop(c)+baf_peatf(c)

but perhaps there is some good reason for this

(13) Subroutine CNFireFluxes hard-codes a lot of information about the structure of the CN pools. I believe the same is done elsewhere in the CN code. This makes it very hard for anyone to add or remove a carbon or nitrogen pool, because they have to understand and modify code spread all over the model. I'm not sure what the solution is, but I feel like the current implementation is not sustainable, so this is worth some thought.

billsacks commented 6 years ago

Erik Kluzek < erik@ucar.edu > - 2013-05-01 21:36:16 -0600

Hey Bill

Yep, for what it's worth -- I agree with you. Some of this I tried to address with Fang months ago, but I think the language barrier got in the way enough that she wasn't willing or able to make some of these changes. With more time, a lot of the suggestions in here are things I would've done on my own as well. As it was, I did what I considered the bare-bones necessary changes. So I changed the use of magic numbers for vegetation indices to the indices from pftvarcon, changed obvious magic numbers to constants, fixed indentation, added some comments, broke up long lines, and introduced at least a few important constants. There were some values that Fang was addressing as "the value that was X in the paper" which is a horrible way to define what you are doing.

In terms of 3 and 4 I figured the current was at least acceptable because it's rare to add vegetation types, and I'm sure other parts of the code are fragile in this way as well.

We should be able to fix "1". For "2" we should be able to get Fang's input on. I did try to get this from Fang before, but we probably need to explicitly ask for particular variables. "5" is another to ask of Fang directly. The use of shr_infnan_isnan is very awkward, but wasn't something I figured I had time to mess with. "6", "7" and "10" are things I'm hoping Sam could help us with. "9" I'm hoping both Fang and Sam can clarify.

On "11" I wondered if some of these parts of the code, shouldn't become a part of or subroutine in "pftdynMod". Whether this was right or not wasn't clear to me either, but Fang did her development with transient PFT simulations, so I'm assuming this part of the code was appropriately shook out.

"8" is something I wanted Fang to address and asked her to, but I don't think she understood what we asked. Again with more time, I would've done more of this on my own. Perhaps, Sam and Fang can help us to define what some of the magic numbers are, so that we can use constants and parameters for them. I did at least make the constants double precision, which will make it easier for them to be constants or parameters. But, we need to know enough about what's going on that we can define names for them.

I suspect the reason for "12" has to do with the fact that land-use change happens on Jan/1st on midnight. And possibly that already provides enough of a change that you can't do deforestation on top of that. But, this means the real problem with the statement is the implicit linking of this line with how often pftdynMod is invoked. That's an example of where maybe some of this should be moved to pftdynMod, and maybe there should be methods of pftdynMod that can be provided to make sure other modules

"13" of course will take more group effort.

billsacks commented 6 years ago

Bill Sacks < sacks@ucar.edu > - 2013-05-02 06:22:13 -0600

Erik: Thanks a lot for your detailed reply. I had a feeling you had already done a lot of cleanup of this code, and to be clear, this wasn't at all meant as a criticism of your gatekeeper work (I don't think you took it that way, but just in case...): I don't think the burden for all of this cleanup should fall on the software engineering gatekeeper. Perhaps we need some more discussion of process, though.

I also realized that I had two #6's and two #7's -- whoops. That's a shame, because the second #6 and #7 are some of the more suspicious pieces of code to me, so I don't want them to get lost.

For the sake of future discussion, let's use this numbering:

(6) More intermediate variables (and/or comments) are needed to make it more clear what this code is doing, so people other than the author can modify it in the future. For example, in the example in #5a, I have no idea what this term is trying to accomplish:

totlitc(c)leafc(p)/leafc_col(c)wtcol(p)/cropf_col(c)

As another example, this equation is very hard to read:

    baf_peatf(c) = boreal_peatfire_c*exp(-SHR_CONST_PI*(max(wf2(c),0._r8)/0.3_r8))* &
    max(0._r8,min(1._r8,(tsoi17(c)-SHR_CONST_TKFRZ)/10._r8))*peatf_lf(c)* &
    (1._r8-fsat(c))

(7) There are many places where long expressions do two things at once: (1) compute some pft-level variable with a complex expression, and (2) average from pft to column -- all in a single line of code. The example in #5a, above, illustrates this.

The code would be clearer and easier to modify if these expressions were split into two pieces: (1) first compute pft-level variable for all quantities, and then (2) average to column, preferably via a call to p2c.

--

(14) The use of cwtgcell in this code is suspicious to me:

             if( ivt(p) == nbrdlf_evr_trp_tree .and. wtcol(p) .gt. 0._r8)then
                trotr1_col(c)=trotr1_col(c)+wtcol(p)*cwtgcell(c)
             end if
             if( ivt(p) == nbrdlf_dcd_trp_tree .and. wtcol(p) .gt. 0._r8)then
                trotr2_col(c)=trotr2_col(c)+wtcol(p)*cwtgcell(c)
             end if
             if( ivt(p) == nbrdlf_evr_trp_tree .or. ivt(p) == nbrdlf_dcd_trp_tree )then
                if(lfpftd(p).gt.0._r8)then
                   dtrotr_col(c)=dtrotr_col(c)+lfpftd(p)*cwtgcell(c)
                end if
             end if

There appears to be no other science code (i.e., non-infrastructure code) that depends on weights on the grid cell. I believe that all code is intended to compute per-area quantities, so a column's weight on the grid cell should be irrelevant in computing a column-level quantity.

If this code is truly correct, it needs more explanation.

(15) Why does the normalization differ for these two averages from pft to col:

          rootc_col(c) = rootc_col(c) + (frootc(p) + frootc_storage(p) + &
                         frootc_xfer(p) + deadcrootc(p) +                &
                         deadcrootc_storage(p) + deadcrootc_xfer(p) +    &
                         livecrootc(p)+livecrootc_storage(p) +           &
                         livecrootc_xfer(p))*wtcol(p)

          fsr_col(c) = fsr_col(c) + fsr_pft(ivt(p))*wtcol(p)/(1.0_r8-cropf_col(c))

i.e., why does the second normalize by (1-cropf_col), but the first does not?

billsacks commented 6 years ago

Bill Sacks < sacks@ucar.edu > - 2014-12-15 12:16:25 -0700

To further justify the importance of (14): When I ran a set of simulations with roundoff-level differences in the percent of each landunit on the grid cell, the only piece of source code that led to divergent evolution of the two were these lines in CNFireMod. When I deleted this dependence on col%wtgcell, the two simulations had identical fields at the patch and column-level, and only roundoff-level differences at the grid cell level (as expected). That is to say, this piece of code in (14) seems to differ from everything else in the model, in that it introduces a dependence on the subgrid weights that should not be there.

billsacks commented 6 years ago

Bill Sacks < sacks@ucar.edu > - 2014-12-19 15:22:13 -0700

Also note that there are currently two versions of btran in the code, one of which is specific to fire. If possible, we'd like to change this so that the fire code uses the standard btran.

billsacks commented 6 years ago

Bill Sacks < sacks@ucar.edu > - 2014-12-19 15:27:42 -0700

Ideally, we should also cross-check the code against the paper & tech note describing this.

billsacks commented 6 years ago

Bill Sacks < sacks@ucar.edu > - 2014-12-19 15:35:12 -0700

Dave would like to do two things initially:

(1) Resolve the use of two separate btran values

(2) Fix the column weight on the gridcell problem (#14 from my list) [Bill]

We will then ask Sam to look at some of the other stuff in here.

billsacks commented 6 years ago

Bill Sacks < sacks@ucar.edu > - 2016-09-18 07:32:49 -0600

In clm4_5_12_r195, I have removed the use of col%wtgcell in the calculation of trotr1 and trotr2 - but not yet in the calculation of dtrotr. So this is a partial fix of problem (14) in this bug report.

billsacks commented 6 years ago

Bill Sacks < sacks@ucar.edu > - 2016-12-22 10:51:09 -0700

I started down the path of addressing:

(12) It is odd to me that deforestation fires cannot happen on Jan 1 at midnight

But gave up: The check for the last time step of the year (searching for kmo) happens for a number of things (e.g., initializing burndate, fbac1, farea_burned)... and I'm not sure what exactly needs to be changed and what needs to be kept as is. So I'm not going to address the issue with deforestation fires not happening on the last time step of the year, because I don't understand how this might interact with other parts of the code - and what I might break by addressing this.

billsacks commented 6 years ago

Bill Sacks < sacks@ucar.edu > - 2017-02-03 10:14:48 -0700

I have fixed the remainder of problem (14) in clm4_5_14_r223 (use of cwtgcell)

billsacks commented 4 years ago

(5b) is fixed in ctsm5.1.dev007:

(b) if( .not. shr_infnan_isnan(btran2(p)) .and. btran2(p) .le. 1._r8 )then

Why is this check needed? Since this code is in a loop over filter_soilc points, it shouldn't include points that have NaN values; and I can't see how btran2 could be greater than 1 -- and if it ever were greater than 1, I would question whether it's really correct to exclude points that have values greater than 1: is it really correct to treat points with values of 1.0000000001 (maybe due to rounding error??) fundamentally differently from points with values of 1.0?

ekluzek commented 4 years ago

I know this was an issue long ago in development of the CN fire code. It's a valid question if it's still needed now (the checking for nan). We may need to check with Fang about the check for 1. I think the check implies that points greater than one are considered invalid and ignored. Let me look into the history of this a little more. I think I can find some info. on it.

billsacks commented 4 years ago

I know this was an issue long ago in development of the CN fire code. It's a valid question if it's still needed now (the checking for nan). We may need to check with Fang about the check for 1. I think the check implies that points greater than one are considered invalid and ignored. Let me look into the history of this a little more. I think I can find some info. on it.

Please don't spend any more time on this. I looked into this extensively already a few weeks ago. Based on my analysis of the code and some test runs, the answer was: there were legitimately points this was trying to skip because they were spval. Other than that, points could have btran slightly greater than 1 due to roundoff-level issues, and I am almost certain that those were improperly being ignored. The NaN checks were not needed at all. I have fixed all of this by using appropriate filters in the calculation and application of btran2.