Closed elray1 closed 2 years ago
Yep, great idea, again @elray1. Thanks! Please check out my reply on #10, and see if you agree with where I ended up, then I can try to address the current suggestion in a refactor as well.
One note I forgot to mention on #10, applies equally-well here. I think making the structure more flexible as in #10 and the current issue is going to force us to forgo inheriting from tsibble
, see #7. At least in the most general case (with multiple columns representing signals and units). I think tsibble
has a single key
and index
. But I haven't read about it in detail. Maybe you could check out tsibble
and see if you agree that it'd be too rigid? Would be good to know that we'd be forgoing that if we do indeed refactor as suggested here and on #10.
It looks like the key
in tsibble
can support multiple columns, so I think it should work ok. Maybe we should use the key
terminology to match what's already done in tsibble
rather than the unit
terminology i suggested in this issue.
Great! Thanks for checking that. So it looks the index
column in tsibble
must remain a single column (time_value
for us) but I guess that's not a big deal.
So tsibble
offers the flexibility we need then. And then it looks like then a single big refactor can be used to address #12, #10, and #7 all at once.
One minor question before I attempt anything, which I should probably ask on #7, but just for convenience let me ask here. Should we keep the column naming scheme index
and key
to be exactly as in tsibble
? Meaning, no time_value
and geo_value
named columns?
Yeah, I guess I have a weak preference to generally match what's being done for tsibble and avoid introducing new terms.
This sounds good to me. So the function would then be named slide_by_key
?
Another type of "unit" to consider: forecasting tasks. E.g., in the AR example, ahead
should be another part of the key; if we tried to use slide_by_geo
on rbound results for multiple aheads, I would expect it to break / give invalid results.
Just a heads up that I'm going to close this with #14.
The main example I'm imagining supporting is flusurv hospitalizations, which report hospitalization rates for each combination of location and age group. For example, a user might want to be able to apply sliding functions to each series corresponding to one combination of these factors. I think supporting this could happen with two changes:
unit_ids
when creating anepi_signal
. This should be a character vector of column names in the data set, and would be stored in the metadata. This could default to"geo_value"
to match current behaviors. A user working with flusurv data might specifyunit_ids = c("geo_value", "age_group")
.slide_by_geo
toslide_by_unit
? And in that function change thegroup_by
to group by the id columns stored in the metadata.