chuhousen / amerifluxr

An R programmatic interface for AmeriFlux data and metadata
https://chuhousen.github.io/amerifluxr/
Other
21 stars 6 forks source link

`amf_read_base` sets timezone incorrectly when `parse_timestamp=TRUE` #95

Open s-kganz opened 1 year ago

s-kganz commented 1 year ago

Hello,

I'm working on joining Ameriflux EC data with quality flags from NSF NEON. The NEON data portal always exports data in UTC, while Ameriflux encodes data in local standard time (ignoring daylight savings). However, amf_read_base sets the timestamp field to UTC without applying an offset, so timestamps downloaded with amerifluxr and neonUtilities will be misaligned. I understand this may be intended behavior and the user is supposed to set the timestamp themselves, but the documentation for amf_read_base reads as if the timestamp is converted (to me, at least).

We can verify this is the case by subtracting the appropriate UTC offset from timestamps in amerifluxr downloads. After doing so, these align with equivalent data downloaded with neonUtilities. Here is a reprex, just set the user/email in the amf_download_base call:

library(tidyverse)
library(lubridate)
library(neonUtilities)
library(amerifluxr)

# Script still works without a token but it goes faster with one
fileName <- "data_in/neon_token"
if (file.exists(fileName)) {
  token <- readChar(fileName, file.info(fileName)$size)
} else {
  token <- NA_character_
}

# Download IR biological temperature from NEON.
prod <- loadByProduct(
  "DP1.00005.001",
  site="WREF",
  tabl="IRBT_30_minute",
  token=token,
  nCores=1,
  startdate="2020-06",
  enddate="2020-07",
  check.size=FALSE
)

neon_tc <- prod$IRBT_30_minute %>%
  # Upper canopy radiometer, equivalent to T_CANOPY_1_1_1 in ameriflux product
  filter(str_c(horizontalPosition, verticalPosition, sep=".") == "000.060") %>%
  select(startDateTime, bioTempMean)

# Now download EC from ameriflux and process it a little to match the NEON
# product.
ameriflux_tc <- amf_download_base(
  "your_username",
  "your_email",
  site="US-xWR",
  data_policy="CCBY4.0",
  agree_policy=TRUE,
  intended_use = "other",
  intended_use_text = "testing amerifluxr",
  verbose = TRUE,
  out_dir = tempdir()
) %>% 
  amf_read_base(
    file = .,
    unzip = TRUE,
    parse_timestamp = TRUE
  ) %>% 
  amf_filter_base() %>%
  filter(YEAR == 2020, MONTH == 6) %>%
  mutate(TIMESTAMP_START = TIMESTAMP - minutes(15)) %>%
  select(TIMESTAMP_START, T_CANOPY_1_1_1)

# Join the two tables
join_misaligned <- left_join(ameriflux_tc, neon_tc, 
                             by=c("TIMESTAMP_START"="startDateTime"))

all.equal(join_misaligned$bioTempMean, join_misaligned$T_CANOPY_1_1_1)

# Now try again, subtracting the UTC offset for PST
join_aligned <- ameriflux_tc %>%
  mutate(TIMESTAMP_START = TIMESTAMP_START - hours(-8)) %>%
  left_join(neon_tc, by=c("TIMESTAMP_START"="startDateTime"))

all.equal(join_aligned$bioTempMean, join_aligned$T_CANOPY_1_1_1)
s-kganz commented 1 year ago

Just poking around the repo, it seems like it would take a little effort to make the UTC offsets available when parsing the timestamp. One idea might be to add another field at the site_display/Ameriflux endpoint so they are available through amf_site_info. If that's not feasible, I would suggest clarifying in the documentation that the user has to set the timezone themselves.

chuhousen commented 1 year ago

Hi Keenan Ganz --

Thanks for writing and sharing this. All NEON data downloaded from AmeriFlux are in local standard time, unlike from the NEON portal.

As you pointed out, there's currently no straightforward way to assign the time zone when parsing the BASE file itself. UTC_OFFSET needs to be parsed from downloaded metadata (BADM). That's why I had to compromise and force the time zone to UTC (even they are not) for the amf_read_base() TIMESTAMP output. I hinted at it in the document https://search.r-project.org/CRAN/refmans/amerifluxr/html/amf_read_base.html, but I could be more specific. I thought about dropping TIMESTAMP output entirely to avoid this. Adding UTC_OFFSET to the API might be a good direction. I'll look into the possibility.

Thanks --

On Fri, Nov 3, 2023 at 3:28 PM Keenan Ganz @.***> wrote:

Just poking around the repo, it seems like it would take a little effort to make the UTC offsets available when parsing the timestamp. One idea might be to add another field at the site_display/Ameriflux endpoint so they are available through amf_site_info. If that's not feasible, I would suggest clarifying in the documentation that the user has to set the timezone themselves.

— Reply to this email directly, view it on GitHub https://github.com/chuhousen/amerifluxr/issues/95#issuecomment-1793191522, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBZDAYOB2KJJTOVWNKT4Y3YCVV2LAVCNFSM6AAAAAA65BHSJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGE4TCNJSGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Housen Chu

Research Scientist Climate and Ecosystem Sciences Division Lawrence Berkeley National Lab email: @. / @. phone: 510-486-6138 website: https://eesa.lbl.gov/profiles/housen-chu/