Open trevorld opened 1 year ago
Thank you again for another thoughtful write-up!
The weekday parsing is definitely underdeveloped. It leans primarily on strptime
to try to convert yearweeks. I must admit that this isn't a terribly familiar format for me, and reading through the spec more thoroughly I think I missed some things.
Before jumping into changes, I'm curious - do you have a use case where yearweeks are frequently used?
Just to itemize the things that this brought to my attention:
strptime
using %U
(US convention, not %V
, the ISO-8601 convention). ISO uses Monday as the start of the week and the US convention uses Sunday as the start of the week. If everything else was working properly, this is already off by 1. %V
is allowed but ignored, so a correct implementation would need to lean on another library or try to post-process the imputed date by handling the off-by-one issue.Even thought strptime
%U
purports to use a range 00-53
, it seems to parse the same values for both 00
and 01
:
strptime("2009-00-0", "%Y-%U-%w")
## [1] "2009-01-04 UTC" # from ?strptime, the week 1 should be the first Sunday, expected 2008
strptime("2009-01-0", "%Y-%U-%w")
## [1] "2009-01-04 UTC"
strptime("2009-02-0", "%Y-%U-%w")
## [1] "2009-01-11 UTC"
Just using this example that you gave for discussion:
parttime::as.parttime("2009-W02") # although day is unknown year-month is definitely 2009-01
## Warning in warn_repr_data_loss(x, includes = "week", excludes = "weekday") :
## Date strings including week and excluding weekday can not be fully
## represented. To avoid loss of datetime resolution, such partial dates
## are best represented as timespans. See `?timespan`.
## <partial_time<YMDhms+tz>[1]>
## [1] "2009"
Parsing to "2009-01"
would be feasible. It would effectively parse first as a timespan (parsing the lower and upper bounds) and then coerce to a parttime. This would produce a month when the week is entirely within the month, or omit the month in cases where a week spans multiple months.
Since I wasn't aware of many use cases for this behavior, trying to be exhaustive here wasn't prioritized.
Before jumping into changes, I'm curious - do you have a use case where yearweeks are frequently used?
Although I have my Google calendar set to also show the ISO week number and I use ISO week dates in my bullet journal I don't have a use case where I need to parse them in R. If I did need to parse them I can first parse them with {datetimeoffset}
(which uses {clock}
in the background to convert to the Gregorian calendar) and if needed convert to {parttime}
:
library(datetimeoffset)
library(parttime)
as_datetimeoffset("2009-W53-7") |> as.parttime()
<partial_time<YMDhms+tz>[1]>
[1] "2010-01-03"
as_datetimeoffset("2009-W01-1") |> as.parttime()
<partial_time<YMDhms+tz>[1]>
[1] "2008-12-29"
as_datetimeoffset("2009-W01") |> as.parttime()
<partial_time<YMDhms+tz>[1]>
[1] NA
as_datetimeoffset("2009-W02") |> as.parttime()
<partial_time<YMDhms+tz>[1]>
[1] "2009-01"
Just my 0.02 monetary units. In Epidemiology is routine to use year-week formats, in fact the epidemiological week is based on the ISO week, but with small differences, such as the first day of the week. Check the implementation of those two in lubridate (https://github.com/tidyverse/lubridate/blob/2f3ab321511ac28ab95f729c8fa36fb9170994ea/R/accessors-week.r#L59).
At least for epiweeks, when those are used in a time series, they get mapped to the Sunday (first day of the epi week), or Wednesday (middle of the epi week).
Another good source on the calculations using year-week, is the old package EpiWeek. You can get the code from its archive: https://cran.r-project.org/src/contrib/Archive/EpiWeek/ -- there is a good description on how to define epi weeks there too
Here is a set of epi week calendars, you can use to test against: https://www.cmmcp.org/mosquito-surveillance-data/pages/epi-week-calendars-2008-2024 -- there is a good description on how to define epi weeks there too
I'm not sure how important the ISO week date parsing is but I'm observing what I suspect are bugs when the ISO week date "year" is different from the Gregorian date "year" when parsing ISO week dates