dgkf / parttime

Work-in-progress R package for handling partial datetimes
https://dgkf.github.io/parttime
Other
17 stars 1 forks source link

Should secfrac be NA when parsing information without second fractions? #30

Closed billdenney closed 2 years ago

billdenney commented 2 years ago

The second row below has a zero for the secfrac part even though second fractions are not present. It seems like the secfrac column should be missing for row 2, instead. Do you agree? If not, can you please explain the rationale of including non-missing secfrac?

as.data.frame(as.parttime(c("2003-12-15T13:14:17.123", "2003-12-15T13:14:17", "2003-12-15T13:14")))
#                         year month day hour min sec secfrac tzhour tzmin
# 2003-12-15T13:14:17.123 2003    12  15   13  14  17   0.123      0     0
# 2003-12-15T13:14:17     2003    12  15   13  14  17   0.000      0     0
# 2003-12-15T13:14        2003    12  15   13  14  NA      NA      0     0
dgkf commented 2 years ago

This was intentional, but it doesn't mean it can't be changed.

The rationale is that fractional parts are inevitably precise only to the number of significant figures. So secfrac is invariably "missing" in the N+1th significant digit. We can consider a missing secfrac as an extension of this (with 0 significant digits).

I'm glad you asked, because this is one of those decisions that was made primarily based on feel, but I never took time to try to sort out why it felt right. I'm glad you're raising it now because I have this issue (https://github.com/dgkf/parttime/issues/25) which would codify this decision, and I'm curious to hear how you feel about this rationale before it causes any changes.

billdenney commented 2 years ago

Good points. I reminded myself of the details of the ISO 8601 standard (at least the Wikipedia summary of it), and for the ISO 8601 standard, the least significant part can be fractional (so 2003.5 is half way through the year 2003). So, in theory, secfrac could be a fraction for any part and not just seconds, when parsing ISO 8601-formatted strings.

Short answer, I'd drop secfrac overall (as noted in #25). If fractions are kept overall, I think that they should only be in the structure when they are part of the original character string.

Longer answer, things get thorny when trying to figure out the precision of a value other than seconds with a fraction. For example, is the string "2003.5", is that equivalent to "2003-07" or "2003-07-02T11:59:59.5"? (I'd lean toward the former.)

dgkf commented 2 years ago

Yeah - absolutely agree. I think secfrac is the right place to draw the line, but it's totally subjective.

For example, is the string "2003.5", is that equivalent to "2003-07" or "2003-07-02T11:59:59.5"? (I'd lean toward the former.)

Yeah, this is how I see it. I think this should follow the "rule of least surprise" (which is of course subjective). When I look at 2003.5, I'm roughly giving it a significance of ~months. As justification, the significant decimal digit of 0.5 years maps pretty closely to months. I feel like mapping this to a month, with all other parts missing preserves the intention of the significance best.

Alternatively, these could throw a warning and return NAs like as.parttime("1999-W04"), which may be ambiguously in Jan or Feb. The warning encourages the use of timespans to represent non-calendar-date missingness instead of a parttime object, which could also allow as.timespan("2007.5") to map to a timespan of 2007-(365 * 0.5+/-0.05). This is a bit more involved than I think is necessary for where the package is at right now, but if there's a usecase for it at some point, I'm not opposed to introducing logic around it.

It's worth noting that currently fractional years/months/weeks/days/hours aren't parsed, though minutes/seconds are. This just happened to be handled already by the regex inspired by the parsedate package's iso handling. To handle these, the same logic could be used to account for the other fractional formats. Personally, I haven't encountered these forms in the wild, but if they're useful, please open an issue requesting to introduce them.

Going to mark as closed, but I'm very glad to have these assumptions challenged.