dgkf / parttime

Work-in-progress R package for handling partial datetimes
https://dgkf.github.io/parttime
Other
17 stars 1 forks source link

ISO 8601 week date edge cases #50

Open trevorld opened 1 year ago

trevorld commented 1 year ago

I'm not sure how important the ISO week date parsing is but I'm observing what I suspect are bugs when the ISO week date "year" is different from the Gregorian date "year" when parsing ISO week dates

parttime::as.parttime("2009-W53-7") # 2010-01-03
Warning in strptime(paste(x[, "year"], x[, "week"], x[, "weekday"] - 1L,  :
  (0-based) yday 373 in year 2009 is invalid
<partial_time<YMDhms+tz>[1]> 
[1] "2009" 
parttime::as.parttime("2009-W01-1") # 2008-12-29
<partial_time<YMDhms+tz>[1]> 
[1] "2009-01-04" 
clock::iso_year_week_day(2009, 1, 1) |> as_year_month_day()
<year_month_day<day>[1]>
[1] "2008-12-29"
parttime::as.parttime("2009-W02") # although day is unknown year-month is definitely 2009-01
Warning in warn_repr_data_loss(x, includes = "week", excludes = "weekday") :
  Date strings including week and excluding weekday can not be fully
represented. To avoid loss of datetime resolution, such partial dates
are best represented as timespans. See `?timespan`.
<partial_time<YMDhms+tz>[1]> 
[1] "2009" 
parttime::as.parttime("2009-W01") # unknown year since includes days in 2008 and 2009
Warning in warn_repr_data_loss(x, includes = "week", excludes = "weekday") :
  Date strings including week and excluding weekday can not be fully
represented. To avoid loss of datetime resolution, such partial dates
are best represented as timespans. See `?timespan`.
<partial_time<YMDhms+tz>[1]> 
[1] "2009" 
dgkf commented 1 year ago

Thank you again for another thoughtful write-up!

The weekday parsing is definitely underdeveloped. It leans primarily on strptime to try to convert yearweeks. I must admit that this isn't a terribly familiar format for me, and reading through the spec more thoroughly I think I missed some things.

Before jumping into changes, I'm curious - do you have a use case where yearweeks are frequently used?

Bugs

Just to itemize the things that this brought to my attention:

Behaviors

Just using this example that you gave for discussion:

parttime::as.parttime("2009-W02") # although day is unknown year-month is definitely 2009-01
## Warning in warn_repr_data_loss(x, includes = "week", excludes = "weekday") :
##   Date strings including week and excluding weekday can not be fully
## represented. To avoid loss of datetime resolution, such partial dates
## are best represented as timespans. See `?timespan`.
## <partial_time<YMDhms+tz>[1]> 
## [1] "2009" 

Parsing to "2009-01" would be feasible. It would effectively parse first as a timespan (parsing the lower and upper bounds) and then coerce to a parttime. This would produce a month when the week is entirely within the month, or omit the month in cases where a week spans multiple months.

Since I wasn't aware of many use cases for this behavior, trying to be exhaustive here wasn't prioritized.

trevorld commented 1 year ago

Before jumping into changes, I'm curious - do you have a use case where yearweeks are frequently used?

Although I have my Google calendar set to also show the ISO week number and I use ISO week dates in my bullet journal I don't have a use case where I need to parse them in R. If I did need to parse them I can first parse them with {datetimeoffset} (which uses {clock} in the background to convert to the Gregorian calendar) and if needed convert to {parttime}:

library(datetimeoffset)
library(parttime)
as_datetimeoffset("2009-W53-7") |> as.parttime()
<partial_time<YMDhms+tz>[1]> 
[1] "2010-01-03" 
as_datetimeoffset("2009-W01-1") |> as.parttime()
<partial_time<YMDhms+tz>[1]> 
[1] "2008-12-29" 
as_datetimeoffset("2009-W01") |> as.parttime()
<partial_time<YMDhms+tz>[1]> 
[1] NA 
as_datetimeoffset("2009-W02") |> as.parttime()
<partial_time<YMDhms+tz>[1]> 
[1] "2009-01" 
jmcastagnetto commented 2 months ago

Just my 0.02 monetary units. In Epidemiology is routine to use year-week formats, in fact the epidemiological week is based on the ISO week, but with small differences, such as the first day of the week. Check the implementation of those two in lubridate (https://github.com/tidyverse/lubridate/blob/2f3ab321511ac28ab95f729c8fa36fb9170994ea/R/accessors-week.r#L59).

At least for epiweeks, when those are used in a time series, they get mapped to the Sunday (first day of the epi week), or Wednesday (middle of the epi week).

Another good source on the calculations using year-week, is the old package EpiWeek. You can get the code from its archive: https://cran.r-project.org/src/contrib/Archive/EpiWeek/ -- there is a good description on how to define epi weeks there too

Here is a set of epi week calendars, you can use to test against: https://www.cmmcp.org/mosquito-surveillance-data/pages/epi-week-calendars-2008-2024 -- there is a good description on how to define epi weeks there too