avallecam commented 10 months ago

To understand how severity has changed over time (e.g. following vaccination or pathogen evolution), use the function cfr_time_varying(). This function is however not well suited to small outbreaks because it requires sufficiently many cases over time to estimate how CFR changes.

However, I do not find a specific reference to a difference or direct comparison between cfr_rolling() and cfr_time_varying().

From reprex-es below, I find that:

cfr_rolling() is more suited to ebola1976 (small time-series), while
cfr_time_varying() is more suited to covid_data (larger time-series).

Can I conclude that cfr_rolling() would be useful for real-time estimations, while cfr_time_varying() for retrospective assessments?

small outbreak time-series

# Load package
library(cfr)
library(tidyverse)

# Load the Ebola 1976 data provided with the package
data("ebola1976")

# Calculate the rolling daily CFR while correcting for delays
rolling_cfr_corrected <- cfr_rolling(
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)

# Calculate the time varying daily CFR while correcting for delays
time_varying_cfr_corrected <- cfr_time_varying(
  data = ebola1976,
  delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)

# combine the data for plotting
rolling_cfr_corrected$method <- "rolling"
time_varying_cfr_corrected$method <- "time_varying"

data_cfr <- rbind(
  rolling_cfr_corrected,
  time_varying_cfr_corrected
)

# visualise both corrected and uncorrected rolling estimates
ggplot(data_cfr) +
  geom_ribbon(
    aes(
      date,
      ymin = severity_low, ymax = severity_high,
      fill = method
    ),
    alpha = 0.2, show.legend = FALSE
  ) +
  geom_line(
    aes(date, severity_mean, colour = method)
  )
#> Warning: Removed 19 rows containing missing values (`geom_line()`).

^{Created on 2024-01-31 with reprex v2.0.2}

large outbreak time-series

# Load package
library(cfr)
library(tidyverse)

# get data pre-loaded with the package
data("covid_data")
df_covid_uk <- covid_data[covid_data$country == "United Kingdom", ]

# estimate time varying severity while correcting for delays
time_varying_cfr <- cfr_time_varying(
  data = df_covid_uk,
  delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440),
  burn_in = 7L
)

covid_rolling <- cfr_rolling(
  data = df_covid_uk,
  delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440)
)

time_varying_cfr %>% 
  mutate(method = "time_varying") %>% 
  bind_rows(
    covid_rolling %>% 
      mutate(method = "rolling")
  ) %>% 
  ggplot() +
  geom_ribbon(
    aes(
      date,
      ymin = severity_low, ymax = severity_high,
      fill = method
    ),
    alpha = 0.2, show.legend = FALSE
  ) +
  geom_line(
    aes(date, severity_mean, color = method)
  )
#> Warning: Removed 68 rows containing missing values (`geom_line()`).

^{Created on 2024-01-31 with reprex v2.0.2}

pratikunterwegs commented 10 months ago

Thanks @avallecam - the two functions are aimed at providing different functionalities.

cfr_rolling() shows what the estimated CFR would be on each day of the outbreak, given that future data on cases and deaths is not available at the time. The final value of cfr_rolling() estimates is expected to be identical to the value of cfr_static() on the same data. This is not sensitive to the length of the outbreak (afaik).
cfr_time_varying() calculates the CFR over a moving window, and helps to understand changes in CFR due to changes in the epidemic, e.g. due to a new variant or due to increased immunity from vaccination. It performs poorly for short outbreaks as it discards some data at the start (the burn_in), discards data at the end (due to the size of smoothing_window) (both cases return NA), and also returns NA where deaths < estimated deaths and estimated deaths > 0. I think that the reason it is less suitable for smaller outbreaks is that these conditions are more common there, returning more NAs.

avallecam commented 10 months ago

Useful clarification.

cfr_rolling() shows the daily cumulative sum of cases used by cfr_static()
about cfr_time_varying we could possibly say that it is sensitive to the length, given the discard of data from burn_in and smoothing_window, and the size of the deaths in the outbreak, given the estimated deaths constraint that it needs.

About the different trends that we get from the two methods, any suggestions about how to discuss it?

pratikunterwegs commented 10 months ago

About the different trends that we get from the two methods, any suggestions about how to discuss it?

I think the key is to discuss the static and time varying methods and where they apply. The rolling method is perhaps more useful to check whether an outbreak's CFR estimate has stabilised. The rolling and time-varying methods aren't really comparable, so I wouldn't really discuss them together. Is it worth mentioning some reasons not to interpret the rolling estimate in relation to the time-varying one?

adamkucharski commented 10 months ago

Perhaps a useful rule of thumb is to discuss in context of the sampling uncertainy. E.g. With 100 cases, the fatality risk estimate will, roughly speaking, have a 95% confidence interval ±10% of the mean estimate (binomial CI). So if we have >100 cases with expected outcomes on a given day, we can get reasonable estimates of the time varying CFR. But if we only have >100 cases over the course of the whole epidemic, we probably need to rely on the static version that uses the cumulative data.

pratikunterwegs commented 10 months ago

Thanks @adamkucharski! @avallecam, did you have any specific suggestions for where these clarifications should be added?

avallecam commented 9 months ago

did you have any specific suggestions for where these clarifications should be added?

In the tutorial episode drafted, we include the clarifications shared here. Please, see that section in the working branch (edit suggestions welcome in working PR). At the end, we refer to the vignette on cfr_time_varying().

Although this outbreak size detail is mentioned briefly in the [get started](https://epiverse-trace.github.io/cfr/articles/cfr.html?q=cfr_time_varying()#estimate-disease-severity) vignette, probably it would be informative to complement the current vignette showing that point in particular, on how the time-varying method performs under different sizes of cases per day (then, our reference from tutorials to vignette could be more specific). If this content gets too long, we can consider another vignette. If appropriate, this could be based on the reprex above.

pratikunterwegs commented 8 months ago

Closed as answered, and this will be addressed by #128

epiverse-trace / cfr

is `cfr_rolling()` suitable for small time-series and `cfr_time_varying()` for larget ones? #123

small outbreak time-series

large outbreak time-series