cmu-delphi / epiprocess

Tools for basic signal processing in epidemiology
https://cmu-delphi.github.io/epiprocess/
Other
13 stars 9 forks source link

`epi_archive$other_keys` is empty #459

Open dsweber2 opened 3 months ago

dsweber2 commented 3 months ago

Apparently not intended behavior. The underlying DT has the right keys, it's just not recorded in metadata at all

Working example

dv <- pub_covidcast(
  source = "doctor-visits",
  signals = "smoothed_adj_cli",
  geo_type = "state",
  time_type = "day",
  geo_values = "ca,fl,ny,tx",
  time_values = epirange(20200601, 20211201),
  issues = epirange(20200601, 20211201)
)
archive <- dv %>%
  select(geo_value, time_value, version = issue, lag, percent_cli = value) %>%
  as_epi_archive(compactify = TRUE, other_keys = c("lag"))
> archive$DT
Key: <geo_value, time_value, lag, version>
        geo_value time_value    version   lag percent_cli
           <char>     <Date>     <Date> <num>       <num>
     1:        ca 2020-06-01 2020-06-06     5    2.140116
     2:        ca 2020-06-01 2020-06-07     6    2.140116
     3:        ca 2020-06-01 2020-06-08     7    2.140379
     4:        ca 2020-06-01 2020-06-09     8    2.114430
     5:        ca 2020-06-01 2020-06-10     9    2.133677
    ---                                                  
117144:        tx 2021-11-23 2021-11-29     6    2.159704
117145:        tx 2021-11-24 2021-11-27     3    2.024028
117146:        tx 2021-11-24 2021-11-29     5    1.937911
117147:        tx 2021-11-25 2021-11-29     4    1.866631
117148:        tx 2021-11-26 2021-11-29     3    1.858596

> names(archive)
[1] "DT"                         "geo_type"                  
[3] "time_type"                  "additional_metadata"       
[5] "clobberable_versions_start" "versions_end"              
brookslogan commented 2 months ago

The other_keys are reflected in the DT's key. Does seem a bit unnatural to access. The solution may be an other_keys() or get_other_keys() method that works on both epi_dfs and epi_archives. I'd also like an epi_keys() / get_epi_keys() method, but that already exists in epipredict with a different meaning than I would have assigned (time_value is included for epi_dfs, and maybe there's not an archive impl yet), and not sure what other name to use for what I'd have called epi_keys (= geo_value, !!!other_keys for both epi_dfs and epi_archives).

Is there anywhere internally where we try to pull $other_keys / [["other_keys"]]? I don't see any right now in (a recent version of) epiprocess.

dsweber2 commented 2 months ago

The solution may be an other_keys() or get_other_keys() method that works on both epi_dfs and epi_archives

remembering attributes(x)$metadata always adds friction for me, so I'd definitely like this feature.

what other name to use for what I'd have called epi_keys

maybe non_time_keys? fwiw, I would have assumed epi_keys meant the entire index, and not just some subset of it.

Is there anywhere internally where we try to pull $other_keys / [["other_keys"]]? I don't see any right now in (a recent version of) epiprocess.

I also can't find any examples for epi_archive. There's a couple for epi_df:

./R/key_colnames.R:27:  other_keys <- attr(x, "metadata")$other_keys
./R/key_colnames.R:33:  other_keys <- attr(x, "metadata")$other_keys
./R/methods-epi_df.R:31:  if (missing(key)) key <- c("geo_value", attributes(x)$metadata$other_keys)
./R/methods-epi_df.R:165:  old_other_keys <- attr(template, "metadata")$other_keys
./R/methods-epi_df.R:166:  attr(res, "metadata")$other_keys <- old_other_keys[old_other_keys %in% cn]
./R/epi_df.R:307:    additional_metadata$other_keys <- unique(
./R/epi_df.R:308:      c(additional_metadata$other_keys, tsibble_other_keys)