dgkf / parttime

Work-in-progress R package for handling partial datetimes
https://dgkf.github.io/parttime
Other
17 stars 1 forks source link

Formatting of missing in the middle components is confusing #47

Open bundfussr opened 1 year ago

bundfussr commented 1 year ago

The formatting of dates where a component in the middle is missing is confusing. For example:

> as.parttime("2020---02", format = parse_cdisc_datetime)
<partial_time<YMDhms+tz>[1]> 
[1] "2020-00-02" 

I would expect `"2020-NA-02" or "2020---02".

dgkf commented 1 year ago

Thanks - I agree it looks a bit odd right now. At least when the output supports color, the missing fields are highlighted in red. This is the same as when trailing missing fields are displayed with option(parttime.print_verbose = TRUE)

options(parttime.print_verbose = TRUE)
as.parttime("2020---02", format = parse_cdisc_datetime)
image

However, this wouldn't even have that mechanism of communication in the case when color output is not supported.

I'd like to explore other display options. I would prefer not to use the string "NA" in the middle of parttimes because some all fields are more than two characters. Whatever is used should occupy the same space as a well formed string to make it easy to scan through columns of data. For example a missing year would be "NA-01" which is a bit hard to interpret without the usual date field length cues. "2020---02": I would like to avoid a single dash as well for the same reason.

But I would consider perhaps a ? to make "2020-??-02". How does that sound to you?

Just for prioritization, is this a particularly common use case? It sounded like this was within spec, but rarely used.

bundfussr commented 1 year ago

Using ? sound good to me.

I think that something in the middle is missing is rare but tools like admiral have to cover it.

trevorld commented 1 year ago

The Extended Date Time Format (EDTF) is an ISO 8601 extension that uses X to replace missing middle components e.g. 2020-XX-02 and uses ? for another use case.

dgkf commented 1 year ago

Thanks @trevorld, then I think XX might be the best option then (eg, "2020-XX-02") since there's some precedence for what it represents. I'd still lean toward using the NA pillar styling, but this will making missingness less ambiguous for cases where the terminal doesn't support color output.