Closed lindsayplatt closed 2 years ago
^ Noticing now that NWISWeb is also reporting in PDT
(which is what my fxn, format(..., tz = 'America/Los_Angeles')
converted to). However, the tz_cd
associated with this gage is PST
. So, does that mean the dateTimes returned from NWISWeb will always be PST
no matter what time of year it is? If so, this is not really a bug but I may suggest we write a tweet or blog post sharing methods for these tricky timezone things. Sounds like I need a crosswalk function between the daylight savings tz codes and standard time. An argument to readNWISuv()
to request that data be returned in the local timezone would be neat, too, but definitely a bigger effort.
Yeah, this is a weird site.
If you look at the raw data: https://waterservices.usgs.gov/nwis/iv/?site=373904118570701&format=waterml,1.1&ParameterCd=72019&startDT=2022-09-06&endDT=2022-09-07 Hidden in the metadata is:
<ns1:timeZoneInfo siteUsesDaylightSavingsTime="false">
Which means this site doesn't acknowledge daylight savings time. The first raw data to show up is: 2022-09-06T00:00:00.000-08:00
On Sept. 6 at midnight with an offset of 8 hours from UTC, that is 1am PDT (but since the sight doesn't use daylight savings, it's displaying it in PST).
Going to my trusty Wikipedia sites on timezones: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones It seems like there's not actually any official timezone that is only PST (they all include PDT).
Anyway...more importantly, there IS a timezone argument in readNWISuv
:
data_query_all <- readNWISuv(siteNumbers = '373904118570701',
parameterCd = '72019',
startDate = '2022-09-06',
endDate = '2022-09-07',
tz = "America/Los_Angeles")
data_query_all$dateTime[1]
[1] "2022-09-06 01:00:00 PDT"
Normally I'd tell you to pick a timezone from that wikipedia page that doesn't have daylight savings (like in the east, "America/Jamaica", in the midwest "America/Guatemala") ...but the Pacific doesn't seem to have one!
There's a bit of discussion on time zones here: https://rconnect.usgs.gov/dataRetrieval/articles/tutorial.html#timetime-zone-discussion I also try to describe it a bit in the "tz" argument in the help file: https://rconnect.usgs.gov/dataRetrieval/reference/readNWISuv.html
I've run into a few sites over the years that don't use daylight savings even though they are in regions that use daylight savings....it's kind of always a pain.
Let me know what kind of info would be more helpful.
Thanks for all of this. I think it will take me a bit to dig through and see what I should use in my application. Definitely not a dataRetrieval bug, so feel free to close! I might just document any changes I make based on this here for others (and myself) to reference in the future.
There were a bunch of other sites that had this issue, too, so I need a pretty programmatic way to address it. I am testing out using this function to convert from a daylight time into savings time (and vice versa if necessary) based on the timezone abbreviation that is returned by readNWISsite()
as the value for tz_desired
. This is used after they are converted from UTC.
# Note that the output from this fxn will say 'PDT' but mean 'PST'
# because you can't have a timezone of 'PST' (it will convert to GMT,
# even when using `lubridate::force_tz(., 'PST')`). This is used
# internally before dropping time and going to a day, so I am
# accepting the risk.
adjust_for_daylight_savings <- function(posix_dates, tz_desired) {
# To go from daylight time (DT) to standard time
# (ST), subtract an hour and vice versa.
tz_conversion_xwalk <- tibble(
from = c('DT', 'ST'),
to = c('ST', 'DT'),
conversion_sec = c(-3600, 3600)
)
tz_current <- unique(format(posix_dates, "%Z"))
if(tz_current == tz_desired) {
# Don't do anything if they already have the desired tz
return(posix_dates)
} else {
# Use the last two characters in both the current and desired
# timezones for matching with the conversion xwalk
tz_conversion_now <- tibble(
from = stringr::str_sub(tz_current, -2, -1),
to = stringr::str_sub(tz_desired, -2, -1)
)
conversion_to_use <- tz_conversion_xwalk %>%
semi_join(tz_conversion_now, by = c('from', 'to')) %>%
pull(conversion_sec)
return(posix_dates + conversion_to_use)
}
}
# Using the function
tz_locale <- 'America/Los_Angeles'
tz_abbr <- readNWISsite('373904118570701') %>% pull(tz_cd)
data_query_all <- readNWISuv(siteNumbers = '373904118570701',
parameterCd = '72019',
startDate = '2022-09-06',
endDate = '2022-09-07')
format(data_query_all$dateTime, "%Y-%m-%d %H:%M:%S",
tz=tz_locale, usetz=TRUE) %>%
as.POSIXct(tz=tz_locale) %>%
adjust_for_daylight_savings(tz_desired = tz_abbr)
I think you probably need to test that code when the daylight savings times switch:
data_query_all <- readNWISuv(siteNumbers = '373904118570701',
parameterCd = '72019',
startDate = '2022-03-11',
endDate = '2022-03-13')
format(data_query_all$dateTime, "%Y-%m-%d %H:%M:%S",
tz=tz_locale, usetz=TRUE) %>%
as.POSIXct(tz=tz_locale) %>%
adjust_for_daylight_savings(tz_desired = tz_abbr)
Error in if (tz_current == tz_desired) { : the condition has length > 1
I found a solution where I changed adjust_for_daylight_savings()
slightly that will work for any number of timezones at once. Again, this works for me because in the end I am summarizing to daily values and want my days to line up with the timezone that NWIS reports.
# Note that the output from this fxn will say 'PDT' but mean 'PST'
# because you can't have a timezone of 'PST' (it will convert to GMT,
# even when using `lubridate::force_tz(., 'PST')`). This is used
# internally before dropping time and going to a day, so I am
# accepting the risk.
adjust_for_daylight_savings <- function(posix_dates, tz_desired) {
# To go from daylight time (DT) to standard time
# (ST), subtract an hour and vice versa. If the
# `from` and `to` values are the same, don't
# change anything about the dates.
tz_conversion_xwalk <- tibble(
from = c('DT', 'ST', 'ST', 'DT'),
to = c('ST', 'DT', 'ST', 'DT'),
conversion_sec = c(-3600, 3600, 0, 0)
)
# There could be more than one timezone if the date range spans across
# the standard to daylight savings switch. Thus, we should be able to
# convert each date independently (which happens in this piped sequence)
tibble(
in_dates = posix_dates,
in_tz = format(posix_dates, "%Z")
) %>%
mutate(
# Use the last two characters in both the current and desired
# timezones for matching with the conversion xwalk
from = stringr::str_sub(in_tz, -2, -1),
to = stringr::str_sub(tz_desired, -2, -1)
) %>%
# Join in conversion xwalk
left_join(tz_conversion_xwalk) %>%
# Alter the date values to match the desired timezone.
mutate(out_dates = in_dates + conversion_sec) %>%
# Pull out just the dates to return
pull(out_dates)
}
library(dplyr)
library(dataRetrieval)
tz_locale <- 'America/Los_Angeles'
tz_abbr <- readNWISsite('373904118570701') %>% pull(tz_cd)
# Example when the query range is during daylight time and not
# savings time - the output times appear to be missing the
# first hour (see `head()` below) and spill over to the next
# day by an hour at the end (see `tail()` below).
data_query <- readNWISuv(siteNumbers = '373904118570701',
parameterCd = '72019',
startDate = '2022-09-06',
endDate = '2022-09-07')
data_query_fmt <- format(data_query$dateTime, "%Y-%m-%d %H:%M:%S",
tz=tz_locale, usetz=TRUE) %>%
as.POSIXct(tz=tz_locale)
head(data_query_fmt, 3) # These values need to get switched to match the query range
tail(data_query_fmt, 3) # These values need to get switched to match the query range
Initial output, which is not going to work:
> head(data_query_fmt, 3) # These values need to get switched to match the query range
[1] "2022-09-06 01:00:00 PDT" "2022-09-06 01:05:00 PDT" "2022-09-06 01:10:00 PDT"
> tail(data_query_fmt, 3) # These values need to get switched to match the query range
[1] "2022-09-08 00:45:00 PDT" "2022-09-08 00:50:00 PDT" "2022-09-08 00:55:00 PDT"
Apply the adjust_for_daylight_savings()
function.
# We can adjust these dates to appear as `PST` values, so that they line up
# with the queried days. Then, when we summarize per day, the outputs will
# match our intended date range.
data_query_adj <- data_query_fmt %>%
adjust_for_daylight_savings(tz_desired = tz_abbr)
head(data_query_adj, 3)
tail(data_query_adj, 3)
Output following the adjustment, where data falls in the appropriate date range:
> head(data_query_adj, 3)
[1] "2022-09-06 00:00:00 PDT" "2022-09-06 00:05:00 PDT" "2022-09-06 00:10:00 PDT"
> tail(data_query_adj, 3)
[1] "2022-09-07 23:45:00 PDT" "2022-09-07 23:50:00 PDT" "2022-09-07 23:55:00 PDT"
# This approach still works during daylight savings time, where PST to PDT switch:
data_query_switch <- readNWISuv(siteNumbers = '373904118570701',
parameterCd = '72019',
startDate = '2022-03-12',
endDate = '2022-03-13')
data_query_switch_fmt <- format(data_query_switch$dateTime, "%Y-%m-%d %H:%M:%S",
tz=tz_locale, usetz=TRUE) %>%
as.POSIXct(tz=tz_locale)
head(data_query_switch_fmt, 3) # These are fine and don't need to get switched
tail(data_query_switch_fmt, 3) # These values need to switch to PST to match the query range
Initial, problematic data output:
> head(data_query_switch_fmt, 3) # These are fine and don't need to get switched
[1] "2022-03-12 00:00:00 PST" "2022-03-12 00:05:00 PST" "2022-03-12 00:10:00 PST"
> tail(data_query_switch_fmt, 3) # These values need to switch to PST to match the query range
[1] "2022-03-14 00:45:00 PDT" "2022-03-14 00:50:00 PDT" "2022-03-14 00:55:00 PDT"
Applying the adjust_for_daylight_savings()
function:
data_query_switch_adj <- data_query_switch_fmt %>%
adjust_for_daylight_savings(tz_desired = tz_abbr)
head(data_query_switch_adj, 3)
tail(data_query_switch_adj, 3)
Appropriately adjusted dates:
> head(data_query_switch_adj, 3)
[1] "2022-03-12 00:00:00 PST" "2022-03-12 00:05:00 PST" "2022-03-12 00:10:00 PST"
> tail(data_query_switch_adj, 3)
[1] "2022-03-13 23:45:00 PDT" "2022-03-13 23:50:00 PDT" "2022-03-13 23:55:00 PDT"
Describe the bug I am using the UV service to query sites from across the country. I supply a date range to the query expecting to get data back between those dates in local time. The data is returned as UTC but once you convert back to local time, I would expect to see all instantaneous data between 00:00:00 on the first date of the query and 23:59:59 on the last day. I was just digging into an issue I was seeing where I unexpectedly had data which spilled into the day after my specified dates (after converting to local time). I also noticed that I am missing data from the beginning of the query, too.
To Reproduce Steps to reproduce the behavior:
Console output:
Expected behavior I would expect to have data ranging from
2022-09-06 00:00:00
to2022-09-07 23:59:59
after converting from UTC to the local time of the gage usingAmerica/Los Angeles
. Not ranging from2022-09-06 01:00:00
to2022-09-08 00:55:00
, as I do above.Screenshots If applicable, add screenshots to help explain your problem.
Session Info Please include your session info:
Additional context It almost looks like the data returned is shifted by an hour, which makes me think this is actually an issue with daylight savings - something with PST vs PDT? However, when I look at my values (console printouts above) from 1 AM to 1:30 AM on Sep 6th with NWISWeb (see screenshots below), they line up (at 1:25 AM the values equal 333.45 and at 1:30 AM they switch to 333.44; if the values were just shifted an hour, that switch in value would be the opposite direction and between :20 and :25, not :25 and :30). So, I am not sure if we are both wrong or what could be done about the query.