challenge on HFR assumes all deaths are within a hospital setting

avallecam commented 5 months ago

The challenge on HFR uses data from {covidregionaldata} country UK region England

Is it fair to assume that all deaths happened in a hospital setting? or in any case, would it be appropriate to add that explicit assumption?

link to question: https://epiverse-trace.github.io/tutorials-middle/severity-static.html#hospitalisation-fatality-risk-hfr
link to assumption pyramid: https://epiverse-trace.github.io/tutorials-middle/severity-static.html#data-sources-for-more-severity-measures

pratikunterwegs commented 4 months ago

I'm not really sure; I think that's a good question for {covidregionaldata} developers who know the package sources better than I do. As a starting point, I would assume that indeed most deaths happened in a hospital setting in England, but let's stick this on the Slack and get one of the PIs to weigh in.

sbfnk commented 4 months ago

There were quite a lot of deaths happening e.g. in care homes, and some at home, that didn't necessarily show up in the hospital data. If an assumption is made that all deaths showed up as hospitalisations I think it's worth writing that explicitly (and perhaps mention why it might be wrong).

avallecam commented 4 months ago

thank you for your replies. We'll need to edit it on #34 .

I remember that I tried really hard to find a dataset with hospitalizations, but forgot about this assumption. If you recall an available data set (like a hospital linelist?) would be helpful for a simpler alternative. I agree to write the assumption, but may generate some confusion too. An alternative could also be to replace it with synthetic data.

Bisaloo commented 4 months ago

An alternative could also be to replace it with synthetic data.

simulist is a good candidate for this. I believe it has all the features you need here and if not, your feedback can be a good source of improvement.

cc @joshwlambert

avallecam commented 4 months ago

thank you for the feedback, I opted to make this simpler, to assess for cfr::prepare_data and give visibility to {incidence2}

library(outbreaks)
library(tidyverse)
library(cfr)

mers_korea_2015$linelist %>% 
  as_tibble() %>% 
  select(starts_with("dt_"))
#> # A tibble: 162 × 6
#>    dt_onset   dt_report  dt_start_exp dt_end_exp dt_diag    dt_death  
#>    <date>     <date>     <date>       <date>     <date>     <date>    
#>  1 2015-05-11 2015-05-19 2015-04-18   2015-05-04 2015-05-20 NA        
#>  2 2015-05-18 2015-05-20 2015-05-15   2015-05-20 2015-05-20 NA        
#>  3 2015-05-20 2015-05-20 2015-05-16   2015-05-16 2015-05-21 2015-06-04
#>  4 2015-05-25 2015-05-26 2015-05-16   2015-05-20 2015-05-26 NA        
#>  5 2015-05-25 2015-05-27 2015-05-17   2015-05-17 2015-05-26 NA        
#>  6 2015-05-24 2015-05-28 2015-05-15   2015-05-17 2015-05-28 2015-06-01
#>  7 2015-05-21 2015-05-28 2015-05-16   2015-05-17 2015-05-28 NA        
#>  8 2015-05-26 2015-05-29 2015-05-15   2015-05-15 2015-05-29 NA        
#>  9 NA         2015-05-29 2015-05-15   2015-05-17 2015-05-29 NA        
#> 10 2015-05-21 2015-05-29 2015-05-16   2015-05-16 2015-05-29 NA        
#> # ℹ 152 more rows

mers_incidence <- mers_korea_2015$linelist %>%
  # incidence2 workflow
  incidence2::incidence(date_index = c("dt_onset","dt_death")) %>%
  # incidence2::complete_dates()
  identity()

mers_incidence
#> # incidence:  35 x 3
#> # count vars: dt_onset, dt_death
#>    date_index count_variable count
#>  * <date>     <chr>          <int>
#>  1 2015-05-11 dt_onset           1
#>  2 2015-05-17 dt_onset           1
#>  3 2015-05-18 dt_onset           1
#>  4 2015-05-20 dt_onset           5
#>  5 2015-05-21 dt_onset           6
#>  6 2015-05-22 dt_onset           2
#>  7 2015-05-23 dt_onset           4
#>  8 2015-05-24 dt_onset           2
#>  9 2015-05-25 dt_onset           3
#> 10 2015-05-26 dt_onset           1
#> # ℹ 25 more rows

mers_incidence %>% 
  cfr::prepare_data(cases_variable = "dt_onset",deaths_variable = "dt_death")
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.
#>          date deaths cases
#> 1  2015-05-11      0     1
#> 2  2015-05-12      0     0
#> 3  2015-05-13      0     0
#> 4  2015-05-14      0     0
#> 5  2015-05-15      0     0
#> 6  2015-05-16      0     0
#> 7  2015-05-17      0     1
#> 8  2015-05-18      0     1
#> 9  2015-05-19      0     0
#> 10 2015-05-20      0     5
#> 11 2015-05-21      0     6
#> 12 2015-05-22      0     2
#> 13 2015-05-23      0     4
#> 14 2015-05-24      0     2
#> 15 2015-05-25      0     3
#> 16 2015-05-26      0     1
#> 17 2015-05-27      0     2
#> 18 2015-05-28      0     1
#> 19 2015-05-29      0     3
#> 20 2015-05-30      0     5
#> 21 2015-05-31      0    10
#> 22 2015-06-01      2    16
#> 23 2015-06-02      0    11
#> 24 2015-06-03      1     7
#> 25 2015-06-04      1    12
#> 26 2015-06-05      1     9
#> 27 2015-06-06      0     7
#> 28 2015-06-07      0     7
#> 29 2015-06-08      2     6
#> 30 2015-06-09      0     1
#> 31 2015-06-10      2     6
#> 32 2015-06-11      1     3
#> 33 2015-06-12      0     0
#> 34 2015-06-13      0     2
#> 35 2015-06-14      0     0
#> 36 2015-06-15      0     1

mers_incidence %>% 
  cfr::prepare_data(cases_variable = "dt_onset",deaths_variable = "dt_death") %>% 
  cfr::cfr_static()
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.
#>   severity_mean severity_low severity_high
#> 1    0.07407407   0.03609165     0.1320059

^{Created on 2024-04-17 with reprex v2.1.0}

avallecam commented 4 months ago

An alternative could also be to replace it with synthetic data.

simulist is a good candidate for this. I believe it has all the features you need here and if not, your feedback can be a good source of improvement.

cc @joshwlambert

just tried {simulist} and filled a (not longer urgent) issue in https://github.com/epiverse-trace/simulist/issues/102 to try to get a similar scenario like the one above

epiverse-trace / tutorials-middle

challenge on HFR assumes all deaths are within a hospital setting #24