Figure out why numbers seem wrong

Aariq commented 1 year ago

NPP seems way too high AGB seems way too low transpiration seems way too low

[x] Inspect raw data coming from .h5 files
[x] Triple-check calculations and conversions
[x] Ask in ED2 discussions if I'm getting the units and conversions right

Aariq commented 1 year ago

Without summing cohorts or doing any calculations, AGB in units of kg/plant seems reasonable, there's just a lot of pine seedlings

Screenshot 2023-02-17 at 3 51 22 PM

Plant density seems reasonable (again, a lot of pine seedlings)

Screenshot 2023-02-17 at 3 53 41 PM

Summing cohorts and multiplying by NPLANT should also give reasonable results, right?

NPP does seem funky, especially since these values are still per cohort and not summed per patch. I assume I should sum the cohorts, yeah? All cohorts of a PFT are growing at the same time in the same patch, so if we want a NPP for setaria, I should add all the setaria cohorts individual NPP together, right?

Screenshot 2023-02-17 at 4 06 09 PM

Aariq commented 1 year ago

Asked in ED2 discussions: https://github.com/EDmodel/ED2/discussions/381

Aariq commented 1 year ago

Old version of model2netcdf.ED2() only pulled out DBH, which is in per-plant units with a note that the conversion they do (multiplying by number of plants) is only applicable for variables that are per-plant. So as far as I can tell, there's never been a (working) version of model2netcdf.ED2() that extracts cohort-level data on NPP to compare with. The closest comparison would be running a single-PFT run and using the tower file values, which are polygon-level (but the R code probably won't be helpful there since the data comes out of ED2 already aggregated to the polygon level)

Aariq commented 1 year ago

maybe I can use the -T- files to check my math somehow though?

Aariq commented 1 year ago

Numbers aren't even internally consistent:

library(tidync)
library(ncdf4)
library(tidyverse)
library(lubridate)

e_file <- "/data/output/pecan_runs/transect_runs/ed2_testout/out/ENS-00002-76/analysis-E-2004-07-00-000000-g01.h5"
t_file <- "/data/output/pecan_runs/transect_runs/ed2_testout/out/ENS-00002-76/analysis-T-2004-00-00-000000-g01.h5"

# -E- file ----------------------------------------------------------------

# look up metadata
e_nc <- nc_open(e_file)
ncatt_get(e_nc, "MMEAN_NPPDAILY_CO")
ncatt_get(e_nc, "MMEAN_NPPDAILY_PY")
ncatt_get(e_nc, "NPLANT")
ncatt_get(e_nc, "PFT")
ncatt_get(e_nc, "PACO_N")
ncatt_get(e_nc, "SIPA_N")
ncatt_get(e_nc, "PYSI_N")

# Extract cohort-level variables from E file
e_cohort <-
  tidync(e_file) |> 
  activate("D0") |> 
  hyper_tibble(select_var = c(
    "MMEAN_NPPDAILY_CO", #"Monthly mean - Net primary productivity - total [kgC/m2/yr]" 
    "PFT", #ED2 PFT number
    "NPLANT" # "Plant density [plant/m2]"
    ))

# Extract polygon-level variables from E file
e_polygon <- 
  tidync(e_file) |>
  activate("D1") |>
  hyper_tibble(select_var = c(
    "AREA",
    "MMEAN_NPPDAILY_PY", # "Monthly mean - Net primary productivity - total [kgC/m2/yr]"
    "PACO_ID", # "First index for patch
    "PACO_N", # Cohort count in each patch
    "SIPA_N", # Number of patches per site
    "PYSI_N" # Number of sites per polygon 
    ))

# rolling-join (not actually necessary in this single-patch example)
e_df <- 
  left_join(e_cohort, e_polygon, join_by(closest(phony_dim_0 >= PACO_ID)))

e_df
# There's only one patch in this polygon. Total NPP of the patch should be the sum of the NPP of the two cohorts...
e_df |> 
  summarize(npp_total = sum(MMEAN_NPPDAILY_CO))
# ... but it's not
4.53 != 0.906

# -T- file ----------------------------------------------------------------

# The -T- file has hourly data for the whole year

t_df <-
  tidync(t_file) |> 
  activate("D1,D0") |> 
  hyper_tibble(select_var = c("FMEAN_NPP_PY")) # "Sub-daily mean - Net primary productivity [kgC/m2/yr]"

# Check units
t_nc<-nc_open(t_file)
ncatt_get(t_nc, "FMEAN_NPP_PY")

# Slice to only keep rows that match -E- file
int <- ymd("2004-07-01") %--% ymd("2004-08-01")
as.period(int, unit = "hours")
t_df <- t_df |> slice_head(n = 744)

# Monthly mean of NPP from this file should be the same as MMEAN_NPPDAILY_PY from the -E- file
t_df |> 
  summarize(mean = mean(FMEAN_NPP_PY))
#... but it's not (0.478 != 0.906)

Aariq commented 1 year ago

Units were wrong and need to weight by AREA. things add up now. Updated data extraction code: https://github.com/cct-datascience/ed2-mandifore/pull/24/commits/574e2cb8197b5b2fcaa2879246a2e7bebe408693

cct-datascience / ed2-mandifore

Figure out why numbers seem wrong #21