Closed dgkf closed 2 years ago
Thanks for thinking through these. You've thought of most of the cases that I've thought of, and I agree with your methods. One other consideration is how do you add to these times?
As an example, what is 2022-NA-04
plus 4 days? That would be pretty clearly 2022-NA-08
.
But, what is 2022-NA-28
plus 1 day? That would depend on the month. In February, it would be March. In any other month, it would be 2022-NA-29
. But it would also have a clear effect on the minimum and maximum since those would now clearly be 2022-01-29
and 2022-12-29
.
My best guess is that addition is not supported or that it would need very specific requirements. I brought it up as something that I'd thought of before and wanted to be sure it was considered, at least.
But, what is 2022-NA-28 plus 1 day?
Thanks for highlighting this. It's definitely non-trivial and gets even further complicated by leap years/seconds.
At least as it is now, if a value is imputed with an invalid date it gets squished to the valid range.
> impute_time(as.parttime("2022--29"), "2022-02-02")
<partial_time<YMDhms+tz>[1]>
[1] "2022-02-28"
I haven't ended up in this sort of situation myself, and my suspicion is that there could end up being quite a few competing "correct" behaviors in this space.
Without knowing more about all the ways this might affect an analysis, I'd lean toward leaving this as undefined behavior and suggest that a user first separate the datetime components themselves and apply use-case-specific logic. But I could certainly add a utility function to check whether any datetimes in a vector fall into this category where they have missingness in the middle.
Does parttime
support leap seconds accurately? (By "accurately", I mean only allowing for real leap seconds? And also only allowing them in time standards that support them which I think means limiting them to TAI time?) I ask because don't think that CDISC supports leap seconds.
Does parttime support leap seconds accurately?
@billdenney sorry - no it doesn't! I was just throwing that out as an example, but it's currently totally ignored.
When I wrote this, I was using lubridate
to handle leap-anything, which only handles years. Now we have the tzdb
package, which gives us access to a tz
's leap-seconds database, so it would be possible (but perhaps unnecessary for this use case).
For leap seconds, I think it would be preferred to ignore them in clinical settings because people would be surprised by 86401 second-long days and "60" in the seconds place would be an error.
@billdenney - Thanks for the heads-up. I think it would be a nice-to-have, but definitely isn't a priority, and if I ever do go down that path I'll be sure to expose some mechanism of disabling it entirely.
Just a status update. Missing-in-the-middle will now parse into a parttime, but I haven't rigorously tested how they are handled by any of the functions or operators that can apply to parttimes. For now, parsing a missing-in-the-middle time emits a warning that these types of dates are experimental.
Great! What would be most helpful for testing?
Things that occur to me are: Add test cases, define rules for operation (which you seem to have above), any other things?
My opinions are that missing-in-the-middle date-times should behave similarly to truncated partial times. Overall, it hangs everything on the min()
and max()
functions and then moves to normal date and date-time handling with the exception of equality.
POSIXct
and Date
objects.
pmin()
or pmax()
on two vectors).==
) would generally be FALSE
, unless the values and missingness were identical.max()
and min()
(x < y
being TRUE
requires that max(x) < min(y)
; being FALSE
requires that min(x) > max(y)
; any other scenario would be `NA)x + 1 day
would be the time range of min(x) + 1 day
to max(x) + 1 day
.min()
and max()
would be fully-defined date-times, lubridate
or any other base R date-time handling come to the rescue here)Finally getting around to taking a crack at this.
Generally I agree with the approach you laid out. There are just a few points that I would implement differently.
Equality (==) would generally be FALSE, unless the values and missingness were identical.
Like NA == NA
returns NA
, even if the missingness is identical, equality between two parttimes with missigness should be NA
. If parttimes represent a range of possible values, then they aren't guaranteed to be equal. This can be coerced to TRUE
using possibly()
.
min()
andmax()
would be fully-defined date-times
I think we're saying the same thing and I'm reading min
and max
here as pseudocode (I started this, so I think this point of confusion is my doing), but just to put a fine point on it, min
and max
the R functions do not perform imputation. Those are handled via impute_time_min
/_max
. The implementation for min
/max
should mirror the logic of min
/max(<numeric>)
and retain the missingness.
Otherwise I think everything looks good. I'm writing up tests and will try to identify any gaps that need to be addressed to make sure this is supported.
Going to mark this as done, since the remaining features for difftimes aren't even supported for parttimes yet. I'll kick off a separate issue that is specifically about these cases.
As mentioned in https://github.com/pharmaverse/admiral/issues/1315, there are situations where dates might have "missing-in-the-middle" missingness - that is, a coarse datetime field is missing, but some of the finer fields are still defined (eg "2022-NA-04").
I'll try my best to articulate what I think would be expected behaviors, but I would appreciate any input if there's another way to look at things (@billdenney - your input would be especially valued here).
Comparison Operators
2022-NA-04
>2022-05-04
NA
(month could be gt or lt)2022-NA-04
>2022-01-01
TRUE
(even minimum imputed month cant be lt)2022-NA-04
>2022-12-31
FALSE
(even maximum imputed month cant be gt)Subtraction (creating partial timespans)
The rough process looks like this:
2022-NA-04
-2021-01-01
1year 3days
1year 331days
1year 117days +/- 114days
2022-NA-01
-2022-NA-04
-337days
331days
3days +/- 334days
min/max
I think this one is straightforward - just use largest/smallest imputed possible times.
max(2022-NA-04, 2022-07-01)
2022-12-04
max(2022-NA-05, 2022-12-31)
2022-12-31