epiverse-trace / epiparameter

R package with library of epidemiological parameters for infectious diseases and functions and classes for working with parameters
https://epiverse-trace.github.io/epiparameter
Other
32 stars 11 forks source link

Vignette on fitting parameters and converting into epiparameter objects #250

Closed adamkucharski closed 7 months ago

adamkucharski commented 8 months ago

A common user need for those who work with linelist data will be to estimate delay distributions then store for future analysis. So could be useful to have a vignette describing key choices/caveats for doing this, as well as showcasing how to get the result into {epiparameter} format. Some example code is provided below showing calculation of onset-to-death from simulated truncated data during an increasing epidemic, under two approximate (but fast and easy to implement) estimation methods:

Simulating 500 onsets with an epidemic that grows at 10% per day (black points below show true simulated distribution of onset-to-death), the naive estimate has 117 data points to fit to (red line) and the cohort estimate has 52 data points (blue line).

comparison

Code for simulations (this could probably be done much more efficiently using {simulist}, which would also highlight nice integration potential between these packages):

library(epiparameter)
library(fitdistrplus)

# Define 50 day window for possible infections
max_day <- 50
time_x <- 1:max_day

# Assume epidemic growing at 10% per day
# Recent infections more likely in growing epidemic
growing_epidemic <- exp(0.1*time_x)
growing_epidemic_prob <- growing_epidemic/sum(growing_epidemic)

# Simulate onset times
sample_onsets <- 500

set.seed(10)
onset_times <- sample(time_x,size=sample_onsets,replace=T,prob=growing_epidemic_prob)
onset_times <- sort(onset_times)

# Sample onset to death times
onset_to_death_period_in <-
  epiparameter::epidist_db(disease = "covid",epi_dist = "onset to death",single_epidist = T) |> 
  discretise()

death_times <- (generate(onset_to_death_period_in,200) + onset_times) |> ceiling()

# Naive calculation for real-time onset to death on day 50
# I.e. higher bias, lower uncertainty method
death_times_rt <- death_times
death_times_rt[death_times_rt>max_day] <- NA # Remove truncated values not available in real-time

onset_to_death_naive <- death_times_rt-onset_times
onset_to_death_naive <- onset_to_death_naive[!is.na(onset_to_death_naive)] # remove NA

# Note fit to continuous distribution for simplicity
param_naive <- fitdist(data = onset_to_death_naive, distr = "lnorm") 

# Cohort calculation for real-time onset to death on day 50
# I.e. lower bias, higher uncertainty method
cut_off_range <- 20

death_times_rt <- death_times
death_times_rt[death_times_rt>max_day] <- NA # Remove truncated values not available in real-time
death_times_rt[onset_times>(max_day - cut_off_range)] <- NA # Remove early onsets (i.e. <20 days to current point)

onset_to_death_cohort <- death_times_rt-onset_times
onset_to_death_cohort <- onset_to_death_cohort[!is.na(onset_to_death_cohort)] # remove NA

param_cohort <- fitdist(data = onset_to_death_cohort, distr = "lnorm")

# Compare fits with original distribution
x_val <- 0:40
y_val <- density(onset_to_death_period_in,x_val)
par(mar=c(4,4,1,1),mgp=c(2,0.7,0),las=0)
plot(x_val, y_val, xlab = "days",ylab="probability",ylim=c(0,0.1))
lines(x_val,dlnorm(x_val, meanlog = param_naive$estimate["meanlog"], sdlog = param_naive$estimate["sdlog"]), 
      col = "red", lwd = 2)
lines(x_val,dlnorm(x_val, meanlog = param_cohort$estimate["meanlog"], sdlog = param_cohort$estimate["sdlog"]), 
      col = "blue", lwd = 2)
adamkucharski commented 7 months ago

Now decided this is better placed as a howto, bringing together all the examples listed above.