cmu-delphi / epiprocess

Tools for basic signal processing in epidemiology
https://cmu-delphi.github.io/epiprocess/
Other
13 stars 9 forks source link

Change `epix_slide` to be more `group_modify`-like #275

Closed brookslogan closed 1 year ago

brookslogan commented 1 year ago

Migrated from #64.

Foreseen common use cases of epix_slide:

Note that:

  1. For pseudoprospective forecasting, we typically want multiple rows per group-reftime, as we may be predicting multiple targets, multiple quantiles, etc. These multiple rows are based on new key columns introduced in the slide computation (e.g., the target date and quantile level), not the old non-grouping parts of the epikey (e.g., the geo_value if we are fitting all geos simultaneously). We might want to output a different epikey set if we are performing/tacking on a geo/age/etc. aggregation.
  2. For extracting version&time-lags of functions of data: we probably want either (a) exactly 1 or (b) either 0 or 1 row per group-reftime (with multiple columns for different functions, lags, etc.). We might want to output a different epikey set if we are performing a geo/age/etc. aggregation.
  3. For summary statistics, we might want to output any number of rows, depending on the analysis.

Except maybe 2(a) with no epikey aggregation, we don't want the 1-per-reftime-epikey or 1-per-epikey-broadcasted-to-epikeys behavior.

So, we should make epix_slide more reframe/group_modify-like than grouped-mutate-like. (Its output column behavior is already like the former; this would focus on row handling.)

brookslogan commented 1 year ago

Closed by #311.