cmu-delphi / covidcast

R and Python packages supporting Delphi's COVIDcast effort.
https://delphi.cmu.edu/covidcast/
33 stars 27 forks source link

Appropriate behavior of `get_covidhub_predictions()` #223

Closed dajmcdon closed 3 years ago

dajmcdon commented 4 years ago
ryantibs commented 4 years ago

Tiny comment: we may as well expose get_covid_hub_forecast_names() to be a public function. This would be useful to have. And, for consistency, I would change covid_hub in this name to covidhub. And, I would make that change throughout (several argument names appear to have covid_hub in them, rather than covidhub, which you can catch with by running grep "covid_hub" evalcast/R/*.R from the command line within the R-packages directory).

brookslogan commented 3 years ago

I am confused by the output/documentation of this function in other aspects; evalcast::get_covidhub_predictions("COVIDhub-baseline", as.Date("2020-09-07")) produces an unnamed list of 20 cards, some with 7-row forecast_distributions, some with 23-row forecast_distributions. Looking at the csv downloaded, there appear to be only 16 distinct targets though, or 12 "inc"-type targets; I am not sure why there are 20 cards or what these 20 cards are. Not sure if it is my version of packages resulting in unnamed lists or other issues.

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] evalcast_0.1.1  testthat_3.0.0  covidcast_0.3.0 tibble_3.0.4   
[5] pipeR_0.6.1.3  

loaded via a namespace (and not attached):
 [1] zoo_1.8-8         tidyselect_1.1.0  remotes_2.2.0     purrr_0.3.4      
 [5] lattice_0.20-41   colorspace_1.4-1  vctrs_0.3.4       generics_0.1.0   
 [9] usethis_1.6.3     utf8_1.1.4        rlang_0.4.8       pkgbuild_1.1.0   
[13] pillar_1.4.6      foreign_0.8-70    glue_1.4.2        withr_2.3.0      
[17] sp_1.4-4          sessioninfo_1.1.1 lifecycle_0.2.0   stringr_1.4.0    
[21] munsell_0.5.0     gtable_0.3.0      rvest_0.3.6       devtools_2.3.2   
[25] memoise_1.1.0     labeling_0.4.2    callr_3.5.1       ps_1.4.0         
[29] maptools_1.0-2    curl_4.3          fansi_0.4.1       Rcpp_1.0.5       
[33] readr_1.4.0       backports_1.1.10  scales_1.1.1      desc_1.2.0       
[37] pkgload_1.1.0     jsonlite_1.7.1    farver_2.0.3      MMWRweek_0.1.3   
[41] fs_1.5.0          ggplot2_3.3.2     hms_0.5.3         digest_0.6.27    
[45] stringi_1.5.3     processx_3.4.4    dplyr_1.0.2       rprojroot_1.3-2  
[49] grid_3.5.0        cli_2.1.0         tools_3.5.0       magrittr_1.5     
[53] crayon_1.3.4      tidyr_1.1.2       usmap_0.5.1       pkgconfig_2.0.3  
[57] ellipsis_0.3.1    xml2_1.3.2        prettyunits_1.1.1 lubridate_1.7.9  
[61] assertthat_0.2.1  httr_1.4.2        rstudioapi_0.11   R6_2.5.0         
[65] compiler_3.5.0   

> get_covidhub_predictions("COVIDhub-baseline", as.Date("2020-09-07"))
[[1]]
# A tibble: 3,142 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01001    <tibble [7 × 2]>     
 2 01003    <tibble [7 × 2]>     
 3 01005    <tibble [7 × 2]>     
 4 01007    <tibble [7 × 2]>     
 5 01009    <tibble [7 × 2]>     
 6 01011    <tibble [7 × 2]>     
 7 01013    <tibble [7 × 2]>     
 8 01015    <tibble [7 × 2]>     
 9 01017    <tibble [7 × 2]>     
10 01019    <tibble [7 × 2]>     
# … with 3,132 more rows

[[2]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [7 × 2]>     
 2 02       <tibble [7 × 2]>     
 3 04       <tibble [7 × 2]>     
 4 05       <tibble [7 × 2]>     
 5 06       <tibble [7 × 2]>     
 6 08       <tibble [7 × 2]>     
 7 09       <tibble [7 × 2]>     
 8 10       <tibble [7 × 2]>     
 9 11       <tibble [7 × 2]>     
10 12       <tibble [7 × 2]>     
# … with 47 more rows

[[3]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [23 × 2]>    
 2 02       <tibble [23 × 2]>    
 3 04       <tibble [23 × 2]>    
 4 05       <tibble [23 × 2]>    
 5 06       <tibble [23 × 2]>    
 6 08       <tibble [23 × 2]>    
 7 09       <tibble [23 × 2]>    
 8 10       <tibble [23 × 2]>    
 9 11       <tibble [23 × 2]>    
10 12       <tibble [23 × 2]>    
# … with 47 more rows

[[4]]
# A tibble: 3,142 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01001    <tibble [7 × 2]>     
 2 01003    <tibble [7 × 2]>     
 3 01005    <tibble [7 × 2]>     
 4 01007    <tibble [7 × 2]>     
 5 01009    <tibble [7 × 2]>     
 6 01011    <tibble [7 × 2]>     
 7 01013    <tibble [7 × 2]>     
 8 01015    <tibble [7 × 2]>     
 9 01017    <tibble [7 × 2]>     
10 01019    <tibble [7 × 2]>     
# … with 3,132 more rows

[[5]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [7 × 2]>     
 2 02       <tibble [7 × 2]>     
 3 04       <tibble [7 × 2]>     
 4 05       <tibble [7 × 2]>     
 5 06       <tibble [7 × 2]>     
 6 08       <tibble [7 × 2]>     
 7 09       <tibble [7 × 2]>     
 8 10       <tibble [7 × 2]>     
 9 11       <tibble [7 × 2]>     
10 12       <tibble [7 × 2]>     
# … with 47 more rows

[[6]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [23 × 2]>    
 2 02       <tibble [23 × 2]>    
 3 04       <tibble [23 × 2]>    
 4 05       <tibble [23 × 2]>    
 5 06       <tibble [23 × 2]>    
 6 08       <tibble [23 × 2]>    
 7 09       <tibble [23 × 2]>    
 8 10       <tibble [23 × 2]>    
 9 11       <tibble [23 × 2]>    
10 12       <tibble [23 × 2]>    
# … with 47 more rows

[[7]]
# A tibble: 3,142 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01001    <tibble [7 × 2]>     
 2 01003    <tibble [7 × 2]>     
 3 01005    <tibble [7 × 2]>     
 4 01007    <tibble [7 × 2]>     
 5 01009    <tibble [7 × 2]>     
 6 01011    <tibble [7 × 2]>     
 7 01013    <tibble [7 × 2]>     
 8 01015    <tibble [7 × 2]>     
 9 01017    <tibble [7 × 2]>     
10 01019    <tibble [7 × 2]>     
# … with 3,132 more rows

[[8]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [7 × 2]>     
 2 02       <tibble [7 × 2]>     
 3 04       <tibble [7 × 2]>     
 4 05       <tibble [7 × 2]>     
 5 06       <tibble [7 × 2]>     
 6 08       <tibble [7 × 2]>     
 7 09       <tibble [7 × 2]>     
 8 10       <tibble [7 × 2]>     
 9 11       <tibble [7 × 2]>     
10 12       <tibble [7 × 2]>     
# … with 47 more rows

[[9]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [23 × 2]>    
 2 02       <tibble [23 × 2]>    
 3 04       <tibble [23 × 2]>    
 4 05       <tibble [23 × 2]>    
 5 06       <tibble [23 × 2]>    
 6 08       <tibble [23 × 2]>    
 7 09       <tibble [23 × 2]>    
 8 10       <tibble [23 × 2]>    
 9 11       <tibble [23 × 2]>    
10 12       <tibble [23 × 2]>    
# … with 47 more rows

[[10]]
# A tibble: 3,142 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01001    <tibble [7 × 2]>     
 2 01003    <tibble [7 × 2]>     
 3 01005    <tibble [7 × 2]>     
 4 01007    <tibble [7 × 2]>     
 5 01009    <tibble [7 × 2]>     
 6 01011    <tibble [7 × 2]>     
 7 01013    <tibble [7 × 2]>     
 8 01015    <tibble [7 × 2]>     
 9 01017    <tibble [7 × 2]>     
10 01019    <tibble [7 × 2]>     
# … with 3,132 more rows

[[11]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [7 × 2]>     
 2 02       <tibble [7 × 2]>     
 3 04       <tibble [7 × 2]>     
 4 05       <tibble [7 × 2]>     
 5 06       <tibble [7 × 2]>     
 6 08       <tibble [7 × 2]>     
 7 09       <tibble [7 × 2]>     
 8 10       <tibble [7 × 2]>     
 9 11       <tibble [7 × 2]>     
10 12       <tibble [7 × 2]>     
# … with 47 more rows

[[12]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [23 × 2]>    
 2 02       <tibble [23 × 2]>    
 3 04       <tibble [23 × 2]>    
 4 05       <tibble [23 × 2]>    
 5 06       <tibble [23 × 2]>    
 6 08       <tibble [23 × 2]>    
 7 09       <tibble [23 × 2]>    
 8 10       <tibble [23 × 2]>    
 9 11       <tibble [23 × 2]>    
10 12       <tibble [23 × 2]>    
# … with 47 more rows

[[13]]
# A tibble: 3,142 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01001    <tibble [7 × 2]>     
 2 01003    <tibble [7 × 2]>     
 3 01005    <tibble [7 × 2]>     
 4 01007    <tibble [7 × 2]>     
 5 01009    <tibble [7 × 2]>     
 6 01011    <tibble [7 × 2]>     
 7 01013    <tibble [7 × 2]>     
 8 01015    <tibble [7 × 2]>     
 9 01017    <tibble [7 × 2]>     
10 01019    <tibble [7 × 2]>     
# … with 3,132 more rows

[[14]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [7 × 2]>     
 2 02       <tibble [7 × 2]>     
 3 04       <tibble [7 × 2]>     
 4 05       <tibble [7 × 2]>     
 5 06       <tibble [7 × 2]>     
 6 08       <tibble [7 × 2]>     
 7 09       <tibble [7 × 2]>     
 8 10       <tibble [7 × 2]>     
 9 11       <tibble [7 × 2]>     
10 12       <tibble [7 × 2]>     
# … with 47 more rows

[[15]]
# A tibble: 3,142 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01001    <tibble [7 × 2]>     
 2 01003    <tibble [7 × 2]>     
 3 01005    <tibble [7 × 2]>     
 4 01007    <tibble [7 × 2]>     
 5 01009    <tibble [7 × 2]>     
 6 01011    <tibble [7 × 2]>     
 7 01013    <tibble [7 × 2]>     
 8 01015    <tibble [7 × 2]>     
 9 01017    <tibble [7 × 2]>     
10 01019    <tibble [7 × 2]>     
# … with 3,132 more rows

[[16]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [7 × 2]>     
 2 02       <tibble [7 × 2]>     
 3 04       <tibble [7 × 2]>     
 4 05       <tibble [7 × 2]>     
 5 06       <tibble [7 × 2]>     
 6 08       <tibble [7 × 2]>     
 7 09       <tibble [7 × 2]>     
 8 10       <tibble [7 × 2]>     
 9 11       <tibble [7 × 2]>     
10 12       <tibble [7 × 2]>     
# … with 47 more rows

[[17]]
# A tibble: 3,142 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01001    <tibble [7 × 2]>     
 2 01003    <tibble [7 × 2]>     
 3 01005    <tibble [7 × 2]>     
 4 01007    <tibble [7 × 2]>     
 5 01009    <tibble [7 × 2]>     
 6 01011    <tibble [7 × 2]>     
 7 01013    <tibble [7 × 2]>     
 8 01015    <tibble [7 × 2]>     
 9 01017    <tibble [7 × 2]>     
10 01019    <tibble [7 × 2]>     
# … with 3,132 more rows

[[18]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [7 × 2]>     
 2 02       <tibble [7 × 2]>     
 3 04       <tibble [7 × 2]>     
 4 05       <tibble [7 × 2]>     
 5 06       <tibble [7 × 2]>     
 6 08       <tibble [7 × 2]>     
 7 09       <tibble [7 × 2]>     
 8 10       <tibble [7 × 2]>     
 9 11       <tibble [7 × 2]>     
10 12       <tibble [7 × 2]>     
# … with 47 more rows

[[19]]
# A tibble: 3,142 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01001    <tibble [7 × 2]>     
 2 01003    <tibble [7 × 2]>     
 3 01005    <tibble [7 × 2]>     
 4 01007    <tibble [7 × 2]>     
 5 01009    <tibble [7 × 2]>     
 6 01011    <tibble [7 × 2]>     
 7 01013    <tibble [7 × 2]>     
 8 01015    <tibble [7 × 2]>     
 9 01017    <tibble [7 × 2]>     
10 01019    <tibble [7 × 2]>     
# … with 3,132 more rows

[[20]]
# A tibble: 57 x 2
   location forecast_distribution
 * <chr>    <list>               
 1 01       <tibble [7 × 2]>     
 2 02       <tibble [7 × 2]>     
 3 04       <tibble [7 × 2]>     
 4 05       <tibble [7 × 2]>     
 5 06       <tibble [7 × 2]>     
 6 08       <tibble [7 × 2]>     
 7 09       <tibble [7 × 2]>     
 8 10       <tibble [7 × 2]>     
 9 11       <tibble [7 × 2]>     
10 12       <tibble [7 × 2]>     
# … with 47 more rows
sgsmob commented 3 years ago

Yes I was confused by the dimension mismatch too but assumed it was just something I didn't understand about our code.

dajmcdon commented 3 years ago

It's giving you cases/deaths/states/counties/ahead(1-5) (I think this accounts for the 20).You can check which with attributes() applied to one of the tibbles in the list. As to why some have 7 instead of 23 quantiles, I'm not sure. The expectation is that we submit for 23, but I don't know why some combinations would result in only a subset of those. Is it filtering out incorrectly from the csv?

The plan is to remove this function and replace with the covidHubUtils. So if this is being buggy, better to fix by importing from there and converting to a list of cards with appropriate attributes than to try to fix this one.

jacobbien commented 3 years ago

@brookslogan If you look at COVIDHub's technical README, they say that for the "N wk ahead inc case" target, 7 quantiles should be specified. (Actually they say 6, but this is a typo)... in particular, c(0.025, 0.100, 0.250, 0.500, 0.750, 0.900, 0.975).

jacobbien commented 3 years ago

A few follow up comments:

  1. If you want to only select for the deaths target, you could use the following:

    cards <- evalcast::get_covidhub_predictions("COVIDhub-baseline", 
                                            forecast_dates = as.Date("2020-09-07"),
                                            response_data_source = "jhu-csse",
                                            response_signal = "deaths_incidence_num")

    This returns just 4 predictions cards. An alternative is to download all as you've done and then use evalcast::filter_predictions() in a subsequent step.

  2. Your issue wondering what these 20 cards represent is precisely the motivation for having a nice print function implemented which would display the key attributes (such as ahead, etc.). Issue #222 will address this. As I describe in #98, we could see something like forecaster name: CMU-TimeSeries; ahead: 2; incidence_period: "epiweek"; geo_type: "state"; forecast_date: 2020-10-01

  3. Until the print method has been implemented, you can use the (not exported) function evalcast:::all_attr(). For example,

    
    > evalcast:::all_attr(cards,"ahead")
    [[1]]
    [1] 1

[[2]] [1] 2

[[3]] [1] 3

[[4]] [1] 4