Closed avallecam closed 4 months ago
I'm not really sure; I think that's a good question for {covidregionaldata} developers who know the package sources better than I do. As a starting point, I would assume that indeed most deaths happened in a hospital setting in England, but let's stick this on the Slack and get one of the PIs to weigh in.
There were quite a lot of deaths happening e.g. in care homes, and some at home, that didn't necessarily show up in the hospital data. If an assumption is made that all deaths showed up as hospitalisations I think it's worth writing that explicitly (and perhaps mention why it might be wrong).
thank you for your replies. We'll need to edit it on #34 .
I remember that I tried really hard to find a dataset with hospitalizations, but forgot about this assumption. If you recall an available data set (like a hospital linelist?) would be helpful for a simpler alternative. I agree to write the assumption, but may generate some confusion too. An alternative could also be to replace it with synthetic data.
An alternative could also be to replace it with synthetic data.
simulist is a good candidate for this. I believe it has all the features you need here and if not, your feedback can be a good source of improvement.
cc @joshwlambert
thank you for the feedback, I opted to make this simpler, to assess for cfr::prepare_data
and give visibility to {incidence2}
library(outbreaks)
library(tidyverse)
library(cfr)
mers_korea_2015$linelist %>%
as_tibble() %>%
select(starts_with("dt_"))
#> # A tibble: 162 × 6
#> dt_onset dt_report dt_start_exp dt_end_exp dt_diag dt_death
#> <date> <date> <date> <date> <date> <date>
#> 1 2015-05-11 2015-05-19 2015-04-18 2015-05-04 2015-05-20 NA
#> 2 2015-05-18 2015-05-20 2015-05-15 2015-05-20 2015-05-20 NA
#> 3 2015-05-20 2015-05-20 2015-05-16 2015-05-16 2015-05-21 2015-06-04
#> 4 2015-05-25 2015-05-26 2015-05-16 2015-05-20 2015-05-26 NA
#> 5 2015-05-25 2015-05-27 2015-05-17 2015-05-17 2015-05-26 NA
#> 6 2015-05-24 2015-05-28 2015-05-15 2015-05-17 2015-05-28 2015-06-01
#> 7 2015-05-21 2015-05-28 2015-05-16 2015-05-17 2015-05-28 NA
#> 8 2015-05-26 2015-05-29 2015-05-15 2015-05-15 2015-05-29 NA
#> 9 NA 2015-05-29 2015-05-15 2015-05-17 2015-05-29 NA
#> 10 2015-05-21 2015-05-29 2015-05-16 2015-05-16 2015-05-29 NA
#> # ℹ 152 more rows
mers_incidence <- mers_korea_2015$linelist %>%
# incidence2 workflow
incidence2::incidence(date_index = c("dt_onset","dt_death")) %>%
# incidence2::complete_dates()
identity()
mers_incidence
#> # incidence: 35 x 3
#> # count vars: dt_onset, dt_death
#> date_index count_variable count
#> * <date> <chr> <int>
#> 1 2015-05-11 dt_onset 1
#> 2 2015-05-17 dt_onset 1
#> 3 2015-05-18 dt_onset 1
#> 4 2015-05-20 dt_onset 5
#> 5 2015-05-21 dt_onset 6
#> 6 2015-05-22 dt_onset 2
#> 7 2015-05-23 dt_onset 4
#> 8 2015-05-24 dt_onset 2
#> 9 2015-05-25 dt_onset 3
#> 10 2015-05-26 dt_onset 1
#> # ℹ 25 more rows
mers_incidence %>%
cfr::prepare_data(cases_variable = "dt_onset",deaths_variable = "dt_death")
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.
#> date deaths cases
#> 1 2015-05-11 0 1
#> 2 2015-05-12 0 0
#> 3 2015-05-13 0 0
#> 4 2015-05-14 0 0
#> 5 2015-05-15 0 0
#> 6 2015-05-16 0 0
#> 7 2015-05-17 0 1
#> 8 2015-05-18 0 1
#> 9 2015-05-19 0 0
#> 10 2015-05-20 0 5
#> 11 2015-05-21 0 6
#> 12 2015-05-22 0 2
#> 13 2015-05-23 0 4
#> 14 2015-05-24 0 2
#> 15 2015-05-25 0 3
#> 16 2015-05-26 0 1
#> 17 2015-05-27 0 2
#> 18 2015-05-28 0 1
#> 19 2015-05-29 0 3
#> 20 2015-05-30 0 5
#> 21 2015-05-31 0 10
#> 22 2015-06-01 2 16
#> 23 2015-06-02 0 11
#> 24 2015-06-03 1 7
#> 25 2015-06-04 1 12
#> 26 2015-06-05 1 9
#> 27 2015-06-06 0 7
#> 28 2015-06-07 0 7
#> 29 2015-06-08 2 6
#> 30 2015-06-09 0 1
#> 31 2015-06-10 2 6
#> 32 2015-06-11 1 3
#> 33 2015-06-12 0 0
#> 34 2015-06-13 0 2
#> 35 2015-06-14 0 0
#> 36 2015-06-15 0 1
mers_incidence %>%
cfr::prepare_data(cases_variable = "dt_onset",deaths_variable = "dt_death") %>%
cfr::cfr_static()
#> NAs in cases and deaths are being replaced with 0s: Set `fill_NA = FALSE` to prevent this.
#> severity_mean severity_low severity_high
#> 1 0.07407407 0.03609165 0.1320059
Created on 2024-04-17 with reprex v2.1.0
An alternative could also be to replace it with synthetic data.
simulist is a good candidate for this. I believe it has all the features you need here and if not, your feedback can be a good source of improvement.
cc @joshwlambert
just tried {simulist} and filled a (not longer urgent) issue in https://github.com/epiverse-trace/simulist/issues/102 to try to get a similar scenario like the one above
The challenge on HFR uses data from {covidregionaldata} country UK region England
Is it fair to assume that all deaths happened in a hospital setting? or in any case, would it be appropriate to add that explicit assumption?