Closed brookslogan closed 1 year ago
Another thought: do we want to rename epix_slide
to epix_version_slide
?
We should try to avoid using tsibble::guess_frequency
if possible for now. Another issue with the full_seq
suggestion is that it could be altered by compactify; we'd at least want to document this, by making sure compactify
descriptions don't promise full compatibility and noting specifically for the ref_time_values
/new-name documentation. This might also lead us to think of a different post-compactify
format that records other versions that had updates reported. (This also hints at making epi_archive
provide a different interpretation: the current setup is as if we continuously and perfectly monitored some data source for updates within some time range; an alternative setup is to say that we only observed a certain set of versions. It even more hassle for a user to provide this version set explicitly, but we could default to something like the default I was describing. This alternative interpretation would mean that as_of should not allow us to ask for versions that weren't measured or should return NAs. It's a bit more complex, but also more precise/realistic.)
Thanks for raising these! I'd like to talk some of these through before you guys get too far along to understand what the problems are.
I tried to design epix_slide()
to be as close as possible to epi_slide()
, thinking this was a feature, rather than a bug. Thus the way I see it is that the name of the function itself, and all its argument names, being close is a good thing.
Hence I like items 1 and 4 in your todo list, but I'm not sure I like the others which are suggesting we rename things (items 2, 3, 5).
Thinking back again about the output type for epix_slide()
(item 6) is a good idea, though I'm not sure we'll arrive at any obvious solutions. I'd be happy to be wrong though.
Should we discuss on a call or here on GitHub? Call might be more fluid. We could combine it with discussing #64 #67 whenever you're ready.
We can discuss on call.
Recording a couple vague thoughts to hash out then:
n
is to try to make it "true", and change the code to given them n
time values... but that will be at some varying lag from the queried version, so the user will then need to be aware of and watch out for that...fill_through_version(archive, fill_versions_end, how="na")
will also mess up the states version default if the spacing is not the smallest possible for a given class. This could be changed to involve frequency guessing, but then its behavior would also be affected by compactify. (If we had a /set/ (not a clopen range) of observed versions + said that things were unobserved between those, then this could be a no op.)How would item 6 even work though? I don't see this happening, unless we also the time_value
feature along with the version
feature.
Also, for item 1, ref_time_values should actually be ref_versions.
Finally, epix_slide
won't grab the latest version; however, this is a bug beyond the scope of this update.
I'll leave it up to you Logan to schedule a call whenever you feel is the right time. We can discuss on the call since I think that's the most fluid, given that see some of these ideas differently (and maybe we'll clarify some confusion along the way, from either side).
Summary from call:
n
problem:
n
value, in both epix_slide
and epi_slide
n
with [before
in epix_slide
and] before
and after
in ~epix_slide
and~ epi_slide
. n=7
would act like before=6, after=0
. [before
should have no default in either function. after
should have a default of 0
in epi_slide
(and doesn't exist in epix_slide
).] [Edits: corrections to match Ryan's description below.]ref_time_values
and time_value
output column to something involving version
[--- or as_of
, to match epi_df
metadata?], or keep the former and have the latter turn into two duplicate columns with both names? Should we output an epi_archive
?max_version
parameter of epix_slide
to version
[or as_of
, to match epi_df
metadata?]?...
args (their own functions' parameter names or their own column names)I don't think many of the tasks in this issue still apply. Closing this and opening other issues.
Thanks for the summary. That all sounds right except for one detail:
Default after should be 0 in epi_slide() and I think after shouldn't even be present as an arg in epix_slide().
(Subtleties for forecast data can be discussed and possible modifications made later, as part of the time vs version discussion)
ref_time_values
default to something involving versions, not all time values;full_seq
up toobserved_versions_end
with period determined usingtsibble::guess_frequency
or an alternative to avoid potential licensing complications~ the default value setting is now https://github.com/cmu-delphi/epiprocess/issues/153; the renaming will be part of a different discussion/issueref_time_values
to something involving "version"~ part of separate discussionn
to something that doesn't imply it's counting something~ replace withbefore
(+add issue about dot prefixes)n
default value (e.g., 7 doesn't make sense; we won't get 7 time values in typical data sources)time_value
column in output toversion
~ part of separate discussionepi_archive
... but we might need to invent geo and time columns...~ part of larger discussion