forestgeo / fgeo.biomass

Calculate biomass with allometric equations from the allodb package and ForestGEO data
https://forestgeo.github.io/fgeo.biomass
GNU General Public License v3.0
8 stars 4 forks source link

Basic support for dbh-specific equations (other more specific issues follow this up) #27

Closed maurolepore closed 5 years ago

maurolepore commented 5 years ago

@gonzalezeb,

To support dbh-specific equations the code needs to compare dbh values in the data with the values of dbh_min_cm and dbh_max_cm.

What should the code do with missing values if dbh_min_cm and dbh_max_cm?

library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3
library(allodb)

all <- equations %>% 
  allodb::set_type() %>% 
  select(equation_id, equation_form, matches("dbh.*_cm$"))

setdiff(all, na.omit(all))
#> # A tibble: 25 x 4
#>    equation_id equation_form                    dbh_min_cm dbh_max_cm
#>    <chr>       <chr>                                 <dbl>      <dbl>
#>  1 b45a32      exp(a+b*log(dbh))                       1.5         NA
#>  2 c8c6f1      exp(a+b*log(dbh))                       1.5         NA
#>  3 785080      exp(a+(b*log(dbh)))*645.704*1.05        1.5         NA
#>  4 c94845      a+b*BA                                 NA           NA
#>  5 870336      exp(a+b*log(dbh))                       1.5         NA
#>  6 748d94      exp(a+b*log(dbh))                       1.5         NA
#>  7 5a774d      exp(a+(b*log(dbh)))*419.814*1.22        1.5         NA
#>  8 fc521f      exp(a+(b*(log(pi*dbh))))               NA           NA
#>  9 e44bb9      exp(a+b*(dbh/(dbh+c)))                 NA           40
#> 10 7b7468      exp(a+b*(dbh/(dbh+c)))                 NA           50
#> # ... with 15 more rows

I suggest replacing NA with 0 in dbh_min_cm and with +Inf in dbh_max_cm. I can do it with code -- no need to fix the .csv database unless you want.


# I'd replace missing values of `dbh_min_cm` should be replaced with 0

# Bad
dbh <- 1
dbh_min_cm <- NA
dbh >= dbh_min_cm
#> [1] NA

# Good
dbh <- 1
dbh_min_cm <- 0
dbh >= dbh_min_cm
#> [1] TRUE

# I'd replace missing values of `dbh_max_cm` should be replaced with +Inf

# Bad
dbh <- 1
dbh_max_cm <- NA
dbh >= dbh_max_cm
#> [1] NA

# Good
dbh <- 1
dbh_max_cm <- +Inf
dbh >= dbh_max_cm
#> [1] FALSE

Created on 2019-03-19 by the reprex package (v0.2.1)

gonzalezeb commented 5 years ago

there are 23 equations in equation table with some type of missing dbh range value (NRA, NI, NA)..

You can replace all of them if you want, but will the csv file also be updated? Otherwise I will do it.

I will make sure to mention that not just in the code but in our methods.

maurolepore commented 5 years ago

You can replace all of them if you want, but will the csv file also be updated? Otherwise I will do it.

No, the kind of fix I would do in code is downstream from the .csv files. In my opinion, the more you can do directly the better. Every new code statement increases the chance of bugs.

maurolepore commented 5 years ago

(Now following up at https://github.com/forestgeo/allodb/issues/83.)

@teixeirak

I'm assuming that the limits imposed by dbh_min/max are inclusive (see example below). Is my assumption correct?

# Inclusive limits
fgeo.biomass:::is_in_range(1, 1, 10)
#> [1] TRUE
fgeo.biomass:::is_in_range(10, 1, 10)
#> [1] TRUE

# Out of bounds
fgeo.biomass:::is_in_range(11, 1, 10)
#> [1] FALSE
fgeo.biomass:::is_in_range(0, 1, 10)
#> [1] FALSE

# Well within the limit
fgeo.biomass:::is_in_range(5, 1, 10)
#> [1] TRUE

Created on 2019-03-20 by the reprex package (v0.2.1)

maurolepore commented 5 years ago

(Now following up at https://github.com/forestgeo/allodb/issues/84.)

The dbh ranges are different for different parts of a tree (e.g. dbh_min_mm is 142 and 25 for rowid 6 and 7). If a row is within the range some but not all the equations that make up the entire body of a tree, then the biomass will be sum for only the body parts in the adequate range. In other words, some partes of a tree will not be represented at all and this could result in underestimating biomass. There is currently no way for the code to know if a crucial equaiton has been lost.

@teixeirak, Maybe a new column should record parts of a whole? E.g. a set of equations with 3 parts should be labelled like "1/3", "2/3", and "3/3"?

#>    rowid anatomic_relevance          dbh dbh_min_mm dbh_max_mm
#>  6   228 stem (wood and bark)      194.       142.       267. 
#>  7   228 foliage total             194.        25        400  

...

maurolepore commented 5 years ago

This can be disregarded. I had suspected that some rowids might have more than one equation that covered the entire tree-body. This came from observing something like the following:

#>    rowid anatomic_relevance          dbh dbh_min_mm dbh_max_mm
#>  8   266 total aboveground biomass  37.6       30        640  
#>  9   266 total aboveground biomass  37.6        3.7       68.3

But I confirm that this is NOT the case, when expert/generic and dbh ranges are considered.

image

Reprex:

library(tidyverse)
library(allodb)

master_tidy() %>% 
  select(
    site, 
    equation_group, 
    species, 
    equation_id, 
    dependent_variable_biomass_component,
    matches("dbh")
  ) %>% 
  filter(site == "scbi") %>% 
  filter(str_detect(dependent_variable_biomass_component, "Total|Whole")) %>% 
  unique() %>% 
  arrange(species, dbh_min_cm) %>% 
  group_by(equation_group) %>% 
  filter(
    str_detect(
      "^Expert$", paste0(sort(unique(equation_group)), collapse = ", ")
    )
  ) %>% 
  add_count(species) %>%
  filter(n > 1) %>%
  select(-n)
maurolepore commented 5 years ago

I'm closing this now. Althought some issues remain, they have been extracted as independent issues of their own.