cmu-delphi / covidcast

R and Python packages supporting Delphi's COVIDcast effort.
https://delphi.cmu.edu/covidcast/
33 stars 27 forks source link

Allow `issues = "*"` to get all issues of a single data point #260

Open ryantibs opened 3 years ago

ryantibs commented 3 years ago

Currently as I understand it, there's not a super convenient way in the covidcast_signal() function (in the covidcast R package) to specify that I want all issues of a single data point. To get all issues of the JHU deaths in PA on Sept 1, I could use, for example:

covidcast::covidcast_signal(data_source = "jhu-csse", signal = "deaths_incidence_num", geo_type = "state", geo_values="pa", start_day = "2020-09-01", end_day = "2020-09-01", issues = c("2020-09-01", "2020-11-10"))

It would be convenient if I could set issues = "*" to return the same thing. Similar for Python.

Tagging @sarah-colq @chinandrew to draw their attention. This is pretty low priority, but should be an easy fix.

capnrefsmmat commented 3 years ago

The Epidata API actually doesn't support fetching all issues, so this will have to be supported on the server side before clients can add it. cc @krivard to add this to the Epidata wishlist

ryantibs commented 3 years ago

Why couldn't this just be interpreted within the covidcast_signal() function as issues = c(start_day, Sys.Date())?

When the Epidata API itself allows for issues = "*", we can use that instead.

krivard commented 3 years ago

We probably need to use max(start_day, min_issue) since we have a bunch of signals whose first issue included data from multiple months beforehand. min_issue is not supported yet in metadata. There's a PR for it (cmu-delphi/delphi-epidata#236) but the logic is tricky and we don't want to inadvertently double the running time of the meta cache updates.

brookslogan commented 2 years ago

We use this kind of query to build epi_archives in epiprocess. In case this is relevant for v4 or implementing issues="*" in the API:

krivard commented 2 years ago

using covidcast(......., issues = epirange(12340101, 34560101))

@melange396 was this the usage you thought was causing performance problems?