epiverse-trace / cfr

R package to estimate disease severity and under-reporting in real-time, accounting for reporting delays in epidemic time-series
https://epiverse-trace.github.io/cfr/
Other
13 stars 3 forks source link

is `cfr_rolling()` suitable for small time-series and `cfr_time_varying()` for larget ones? #123

Closed avallecam closed 7 months ago

avallecam commented 9 months ago

From documentation:

To understand how severity has changed over time (e.g. following vaccination or pathogen evolution), use the function cfr_time_varying(). This function is however not well suited to small outbreaks because it requires sufficiently many cases over time to estimate how CFR changes.

However, I do not find a specific reference to a difference or direct comparison between cfr_rolling() and cfr_time_varying().

From reprex-es below, I find that:

Can I conclude that cfr_rolling() would be useful for real-time estimations, while cfr_time_varying() for retrospective assessments?

small outbreak time-series

# Load package
library(cfr)
library(tidyverse)

# Load the Ebola 1976 data provided with the package
data("ebola1976")

# Calculate the rolling daily CFR while correcting for delays
rolling_cfr_corrected <- cfr_rolling(
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)

# Calculate the time varying daily CFR while correcting for delays
time_varying_cfr_corrected <- cfr_time_varying(
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)

# combine the data for plotting
rolling_cfr_corrected$method <- "rolling"
time_varying_cfr_corrected$method <- "time_varying"

data_cfr <- rbind(
  rolling_cfr_corrected,
  time_varying_cfr_corrected
)

# visualise both corrected and uncorrected rolling estimates
ggplot(data_cfr) +
  geom_ribbon(
    aes(
      date,
      ymin = severity_low, ymax = severity_high,
      fill = method
    ),
    alpha = 0.2, show.legend = FALSE
  ) +
  geom_line(
    aes(date, severity_mean, colour = method)
  )
#> Warning: Removed 19 rows containing missing values (`geom_line()`).

Created on 2024-01-31 with reprex v2.0.2

large outbreak time-series

# Load package
library(cfr)
library(tidyverse)

# get data pre-loaded with the package
data("covid_data")
df_covid_uk <- covid_data[covid_data$country == "United Kingdom", ]

# estimate time varying severity while correcting for delays
time_varying_cfr <- cfr_time_varying(
  data = df_covid_uk,
  delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440),
  burn_in = 7L
)

covid_rolling <- cfr_rolling(
  data = df_covid_uk,
  delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440)
)

time_varying_cfr %>% 
  mutate(method = "time_varying") %>% 
  bind_rows(
    covid_rolling %>% 
      mutate(method = "rolling")
  ) %>% 
  ggplot() +
  geom_ribbon(
    aes(
      date,
      ymin = severity_low, ymax = severity_high,
      fill = method
    ),
    alpha = 0.2, show.legend = FALSE
  ) +
  geom_line(
    aes(date, severity_mean, color = method)
  )
#> Warning: Removed 68 rows containing missing values (`geom_line()`).

Created on 2024-01-31 with reprex v2.0.2

pratikunterwegs commented 9 months ago

Thanks @avallecam - the two functions are aimed at providing different functionalities.

avallecam commented 9 months ago

Useful clarification.

About the different trends that we get from the two methods, any suggestions about how to discuss it?

pratikunterwegs commented 9 months ago

About the different trends that we get from the two methods, any suggestions about how to discuss it?

I think the key is to discuss the static and time varying methods and where they apply. The rolling method is perhaps more useful to check whether an outbreak's CFR estimate has stabilised. The rolling and time-varying methods aren't really comparable, so I wouldn't really discuss them together. Is it worth mentioning some reasons not to interpret the rolling estimate in relation to the time-varying one?

adamkucharski commented 9 months ago

Perhaps a useful rule of thumb is to discuss in context of the sampling uncertainy. E.g. With 100 cases, the fatality risk estimate will, roughly speaking, have a 95% confidence interval ±10% of the mean estimate (binomial CI). So if we have >100 cases with expected outcomes on a given day, we can get reasonable estimates of the time varying CFR. But if we only have >100 cases over the course of the whole epidemic, we probably need to rely on the static version that uses the cumulative data.

pratikunterwegs commented 9 months ago

Thanks @adamkucharski! @avallecam, did you have any specific suggestions for where these clarifications should be added?

avallecam commented 8 months ago

did you have any specific suggestions for where these clarifications should be added?

In the tutorial episode drafted, we include the clarifications shared here. Please, see that section in the working branch (edit suggestions welcome in working PR). At the end, we refer to the vignette on cfr_time_varying().

Although this outbreak size detail is mentioned briefly in the [get started](https://epiverse-trace.github.io/cfr/articles/cfr.html?q=cfr_time_varying()#estimate-disease-severity) vignette, probably it would be informative to complement the current vignette showing that point in particular, on how the time-varying method performs under different sizes of cases per day (then, our reference from tutorials to vignette could be more specific). If this content gets too long, we can consider another vignette. If appropriate, this could be based on the reprex above.

pratikunterwegs commented 7 months ago

Closed as answered, and this will be addressed by #128