Open rachlobay opened 2 years ago
as_of
[of the final result, if that's what we are talking about here] should probably be set to z$versions_end
[or we shouldn't output an epi_df
at all! "right" class is probably a generalized archive
object that doesn't require a geo_value
or time_value
column]. (This is the last version we "observed", and basically acts like the archive's "version". In most cases this is equivalent to max(z$DT$version)
; when it's different, it will be greater than this max value and express that we observed additional versions that contained no updates.)
[~On the other hand, maybe it should be max(ref_time_values)
, which would be automatic if we were to make rbind
, bind_rows
, etc. on epi_df
s use the max as_of
rather than the first one. Current ?dplyr::dplyr_extending
may not make this easy.~ While we do want this fix as well, the output for this function should probably be based on versions_end
, because if, say versions_end
overwrote/clobbered some version data from max(ref_time_values)
, it'd be more appropriate to label the output as_of
using the former rather than the latter. --- Counterpoint: user may use a single ref_time_value
at a time, and expect as_of
of the output to be that ref_time_value
, not the versions_end
. And default slide range should go through versions_end
, so there's no conflict between the two for the default, and the user might understand that things will change if they change the ref_time_values
. But we also need to consider all_versions = TRUE
, where things might change. These questions hint that an epi_df
might not be the right output class for a multi-ref_time_value
epix_slide
.]
If we are talking about the input to the group computations, then yes, it should be what we already get out of $as_of
; the corresponding ref time value / fc time value / version. [This is in line with simulating production forecasting at previous dates, assuming that on forecast time value fctv, we were able to observe version fctv for all data sources involved, and then made a forecast based on the set of targets defined for forecast time value fctv. There could be some more special situations, but thinking about how to handle them seems like a separate issue.]
This is related to the discussion of what the output class and column names of epix_slide
operations should be (#163, #49, #259, ...). Before we consider those larger changes, I plan to make an incremental improvement / temporary decision by assigning the output (if it's an epi_df
) an as_of
equal to the input archive's versions_end
.
time_value
column; this just keeps around a different type of information (a proxy for the archive's as_of
--- we might consider adding a timestamp as_of
to archives and this might make even more sense, though it gets more into the details of version-clobbering which we want to hide (#176))as_of
to change when expanding/changing the ref_time_values
set, unless the user is working with one ref_time_value
at a time.all_versions=TRUE
later allows epi_df
output; one would expect the ref_time_value
for a computation, which will be the versions_end
of the computation's input archive, to be the as_of
of its output, not some shifting around based on the ref_time_values
of whatever nested epix_slide
-ing it does.(We might also consider a special value of as_of
indicating that it mixes data from multiple versions. Again, relates to #242.)
--- CURRENT STATE: ---
epi_df
s from epix_slide()
, to avoid the clearly bad behavior described in the first comment.epi_df
s etc. still exist; however, even if they were fixed, we might still need some special logic in epix_slide()
(bugfixes might get us max of the ref_time_values
; we'd need to decide if that's right).epi_df
even with the bugfixes. Related: time_value vs. version output column naming discussion.
As originally stated in Issue #208, after making the change to allow
epix_slide
toepi_df
so that we can slide a forecaster usingarx_forecaster()
over anepi_archive
, it is good to consider whether the resulting metadata (namely,as_of
) is what we want.In the below example,
as_of
is related to the firstfc_time_values
2020-08-01… but do we really want that? Does it make the most sense to use here?