cmu-delphi / covidcast

R and Python packages supporting Delphi's COVIDcast effort.
https://delphi.cmu.edu/covidcast/
33 stars 28 forks source link

`get_zoltar_predictions` fails using default arguments #589

Open nmdefries opened 1 year ago

nmdefries commented 1 year ago

Problem

get_zoltar_predictions fails during some very common-place calls, like with the default arguments (incidence_period = epiweek and all possible target signals), e.g.

> get_zoltar_predictions("CMU-TimeSeries", forecast_dates = "2022-07-18")
get_token(): POST: https://zoltardata.com/api-token-auth/
get_resource(): GET: https://zoltardata.com/api/projects/
get_resource(): GET: https://zoltardata.com/api/project/44/timezeros/
[1] "Grabbing forecasts from Zoltar..."
Error: POST status was not 200. status_code=400, json_response=Invalid query. error_messages='["target with name not found. 
+        name=1 wk ahead inc hosp, valid names=['17 day ahead inc hosp', '17 wk ahead cum death', '17 wk ahead inc death', 
+        '18 day ahead cum death', '18 day ahead inc death', '2 wk ahead inc case', '101 day ahead inc hosp', ...

See https://github.com/cmu-delphi/covidcast/pull/587 for context.

This error is because of improper target construction. Hospitalizations are forecast on a daily basis (1 day ahead inc hosp). When the hosp signal is requested and epiweek incidence_period is selected (either when incidence_period = "epiweek" or when incidence_period = c("epiweek", "day"), the default value, which the function interprets as incidence_period = "epiweek" via match.args) we construct invalid weekly hospitalization targets.

The same error happens for cases and deaths if incidence_period is set to day.

Example calls

> get_zoltar_predictions("CMU-TimeSeries", forecast_dates = "2020-07-20", 
+        signal = c("confirmed_incidence_num", "deaths_incidence_num", "deaths_cumulative_num"), 
+        incidence_period = "epiweek")
get_token(): POST: https://zoltardata.com/api-token-auth/
get_resource(): GET: https://zoltardata.com/api/projects/
get_resource(): GET: https://zoltardata.com/api/project/44/timezeros/
[1] "Grabbing forecasts from Zoltar..."
# A tibble: 4,992 × 10                                                                                                                              
   ahead geo_value quantile value forecaster     forecast_date data_source signal               target_end_date incidence_period
    0s<int> <chr>        <dbl> <dbl> <chr>          <date>        <chr>       <chr>                <date>          <chr>           
 1     1 al          NA       172 CMU-TimeSeries 2020-07-20    jhu-csse    deaths_incidence_num 2020-07-25      epiweek         
 2     1 al           0.01     46 CMU-TimeSeries 2020-07-20    jhu-csse    deaths_incidence_num 2020-07-25      epiweek         
 3     1 al           0.025    67 CMU-TimeSeries 2020-07-20    jhu-csse    deaths_incidence_num 2020-07-25      epiweek         
 4     1 al           0.05     83 CMU-TimeSeries 2020-07-20    jhu-csse    deaths_incidence_num 2020-07-25      epiweek         
 5     1 al           0.1     104 CMU-TimeSeries 2020-07-20    jhu-csse    deaths_incidence_num 2020-07-25      epiweek         
 6     1 al           0.15    117 CMU-TimeSeries 2020-07-20    jhu-csse    deaths_incidence_num 2020-07-25      epiweek         
 7     1 al           0.2     128 CMU-TimeSeries 2020-07-20    jhu-csse    deaths_incidence_num 2020-07-25      epiweek         
 8     1 al           0.25    137 CMU-TimeSeries 2020-07-20    jhu-csse    deaths_incidence_num 2020-07-25      epiweek         
 9     1 al           0.3     143 CMU-TimeSeries 2020-07-20    jhu-csse    deaths_incidence_num 2020-07-25      epiweek         
10     1 al           0.35    153 CMU-TimeSeries 2020-07-20    jhu-csse    deaths_incidence_num 2020-07-25      epiweek         
# … with 4,982 more rows
> get_zoltar_predictions("CMU-TimeSeries", forecast_dates = "2020-07-20", 
+        signal = c("confirmed_incidence_num", "deaths_incidence_num", "deaths_cumulative_num"), 
+        incidence_period = "day")
get_token(): POST: https://zoltardata.com/api-token-auth/
get_resource(): GET: https://zoltardata.com/api/projects/
get_resource(): GET: https://zoltardata.com/api/project/44/timezeros/
[1] "Grabbing forecasts from Zoltar..."
Error: POST status was not 200. status_code=400, json_response=Invalid query. error_messages='["target with name not found. 
+        name=1 day ahead inc case, valid names=['17 day ahead inc hosp', '17 wk ahead cum death', '17 wk ahead inc death', 
+        '18 day ahead cum death', '18 day ahead inc death', '2 wk ahead inc case', '101 day ahead inc hosp', '102 day ahead 
+        cum death', '102 day ahead inc death', '102 day ahead inc hosp', 

Comparison to get_covidhub_predictions

This differs from get_covidhub_predictions's behavior. get_covidhub_predictions interprets incidence_period = c("epiweek", "day"), the default setting, as-is (in contrast to the documentation) and fetches predictions for both period types. This means that the two functions are not interchangeable.

Expected behavior

dshemetov commented 1 year ago

Just want to link this issue with #99 and #586. The first one looks like a record of attempts to make Zoltar supersede our scraping functions, the second is the bug we had a few months back.