Open brookslogan opened 2 years ago
"Discuss" will eventually mean running sketches of use by some potential users / other developers (e.g., Evan, Jacob).
As discussion on #170 and #171 has brought up, another naming option for the relevant output column(s), besides time_value
, version
, and both, is ref_time_value
.
Some remaining discussion points from #146:
- Separate discussion: should we rename
ref_time_values
andtime_value
output column to something involvingversion
, or keep the former and have the latter turn into two duplicate columns with both names? Should we output anepi_archive
?
[or should we call the time/version output column ref_time_value
?]
- Separate discussion: should we rename
max_version
parameter ofepix_slide
toversion
?
[Since we're moving to more consistently use an "implicit versioning" scheme, where last-version-of-each-observation-carried-forward is assumed everywhere in archives, this may make sense. However, we might then need to think about the naming or discussion of the $DT$version
column.]
Some other existing mismatches between slide operations that we might want to think about:
mutate
-like vs summarize
-like
epi_slide
is like mutate
: it keeps existing columns and, given scalar output from f
, broadcasts each slide result to the associated input locations to maintain size stabilityepix_slide
is like summarize
: it only produces the grouping columns + f
results, and ~doesn't broadcast~ broadcasts differentlyn
-> before
, after
/ before
hits:
before=k
, after
missing in epi_slide
means a trailing/right-aligned window that will actually have data at that right side of the window (as we ensure ref_time_values %in% unique(x$time_value)
), unless there are some variable-time-values-by-group things to think aboutbefore=k
, after
missing/not-accepted-as-an-arg in epix_slide
means a window extending infinitely far into the future, but in typical surveillance data cases, will only contain data up to some time before the associated ref_time_value
; to call it trailing/right-aligned doesn't seem precise either way.Advanced usage:
d
(target dates >= d
) but using data as of some other version v < d
(regular surveillance will only be available for time values < v
).epix_slide
over datetime ref_time_values
corresponding to a forecast pipeline schedule.Compactify compatibility
Alternative to implicit versioning interface: explicit versioning interface
as_of
would raise error or give NAs in between observed versionsepix_slide
would require ref_time_values
as between observed versionscompactify
would likely have less issues/details/additions needed to keep the same behavior between compactified and uncompactified dataAnother idea to consider here: guess what label to use for the ref_timevalues based on the user output: if they provide a(n epi)df with a time_value
column, then use version
; else use time_value
.
See https://github.com/cmu-delphi/epiprocess/issues/146#issuecomment-1192785302;
time_value
vsversion
bullet point.