Open AFg6K7h4fhy2 opened 3 weeks ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.39%. Comparing base (
71228e5
) to head (d5671aa
). Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@dylanhmorris
Would appreciate some Why thoughts on the following settings:
day_of_week_effect_prior_modes = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
day_of_week_effect_prior_scales = [0.25, 0.25, 0.25, 0.25, 0.25, 0.25]
holiday_eff_prior_mode = -0.2
holiday_eff_prior_scale = 0.2
post_holiday_eff_prior_mode = 0.0
post_holiday_eff_prior_scale = 0.2
recency_eff_prior_mode = 0.0
recency_eff_prior_scale = 0.1
recency_effect_length = 0
generation_time_dist = [0.233, 0.359, 0.198, 0.103, 0.053, 0.027, 0.014, 0.007, 0.003, 0.002, 0.001]
ihr_intercept_prior_mode = -6.5
ihr_intercept_prior_scale = 0.5
inf_model_prior_infections_per_capita = 0.0001
inf_model_seeds = 8
inf_to_hosp_dist = [0.05, 0.1, 0.175, 0.225, 0.175, 0.125, 0.075, 0.05, 0.025]
max_rt = 3.0
n_pre_observation_days = 14
non_obs_effect_prior_mode = -50
non_obs_effect_prior_scale = 0.01
reciprocal_dispersion_prior_mode = 10
reciprocal_dispersion_prior_scale = 5
reference_day_of_week = 4
rt_intercept_prior_mode = -0.4054651
rt_intercept_prior_scale = 0.3
seed = 54321
susceptible_fraction_prior_mode = 0.9
susceptible_fraction_prior_scale = 0.1
weekly_rw_prior_scale = 0.25
first_fitting_date = "2023-09-15"
[mcmc]
adapt_delta = 0.9
max_treedepth = 12
n_chains = 1
n_warmup = 1000
n_iter = 2000
Apologies for not putting this question here sooner. I do not know why I bottled it in my markdown file.
I would also appreciate some thoughts on my attempted description of the transmission component of cfaepim
:
In cfaepim
, the influenza hospitalizations for each US territory are modelled separately, as though influenza transmission, infection, and observation within each territory's boundaries is completely independent of other territories. This corresponds to an "unpooled" modelling case (see Epidemia's partial pooling page). For each US territory, $\mathscr{R}_t$ can be modeled as a transformed linear prediction:
$$\mathscr{R}_t = g^{-1}(\eta_t)$$
where $g$ is a link function and $\eta_t$ is the prediction on the transformed scale.
The link function for cfaepim
is a scaled logit, so:
$$g^{-1}(x) = \frac{K}{1 + e^{-x}}$$
with $K = 3$ since max_rt = 3.0
. As for each territory's transformed scale prediction $\eta_t$, we have
$$\eta_t = \beta_0 + W_t$$
where $W_t$ is a random walk satisfying $$Wt = W{t-1} + \gamma_t$$ with $t > 0$ and $W_0 = 0$, and $\gamma_t \sim \mathcal{N}(0, \sigma)$, where $\sigma \sim \mathcal{N}^{+}(0, 0.25)$ since weekly_rw_prior_scale = 0.25
. Note that the $\mathscr{R}_t$ intercept, i.e. $\beta_0$, has a prior. Since rt_intercept_prior_mode = -0.4054651
and rt_intercept_prior_scale = 0.3
, $\beta_0 \sim \mathcal{N}(-0.4054651, 0.3)$.
Original R code for reference:
build_light_rt <- function(rt_intercept_prior_mode,
rt_intercept_prior_scale,
max_rt,
rw_prior_scale) {
rt_model <- epidemia::epirt(
formula = as.formula(
sprintf(
paste0(
"R(location, date) ~ 1 + ",
"rw(time = week, gr = location, prior_scale = %f)"
),
rw_prior_scale
)
),
prior_intercept = rstanarm::normal(
location = rt_intercept_prior_mode,
scale = rt_intercept_prior_scale
),
link = epidemia::scaled_logit(K = max_rt)
)
return(rt_model)
}
Similar to above, would appreciate some thoughts on the veracity of the following description:
Proceeding, each US territory has is own time series of observed influenza hospitalizations $Y = (Y_1, \dotsc, Y_n)$. The observed influenza hospitalizations at time $t$ are given by $$Y_t \sim p(y_t, \phi)$$ where $p(y_t, \phi)$ is the sampling distribution with auxiliary parameter $\phi$. Note that since reciprocal_dispersion_prior_mode = 10
and reciprocal_dispersion_prior_scale = 5
, we have $\phi \sim \mathcal{N}(10, 5)$. Since family = "neg_binom"
, $\phi$ represents the reciprocal dispersion of the sampling distribution.
The expected value for the observed influenza hospitalizations on day $t$, i.e. $y_t$, is given by
$$y_t = \alphat \sum{s < t} is \pi{t-s}$$
with $s < t$ and where
link = "logit"
) function $g$, where $$g^{-1}(x) = \frac{1}{1 + e^{-x}}$$Within cfaepim
, the linear predictor for $\alpha_t$ consists of
day_of_week
).is_holiday
).is_post_holiday
). We have:
$$ \begin{aligned} \alpha_t &= g^{-1}(\eta_t)\ \etat &= \sum{i=0}^9 \beta_i \end{aligned} $$
where
ihr_intercept_prior_mode = -6.5
and ihr_intercept_prior_scale = 0.5
. day_of_week_effect_prior_modes
and day_of_week_effect_prior_scales
).holiday_eff_prior_mode = -0.2
and holiday_eff_prior_scale = 0.2
, and refers to the holiday effect.post_holiday_eff_prior_mode = 0.0
and post_holiday_eff_prior_scale = 0.2
, and refers to the post-holiday effect.non_obs_effect_prior_mode = -50
and non_obs_effect_prior_scale = 0.01
, and refers to the change in the observation probability during the nonobservation (seeding) period.Within cfaepim
, we have the infection to hospitalization delay distribution as
$$\pi = \begin{bmatrix} 0.05 & 0.1 & 0.175 & 0.225 & 0.175 & 0.125 & 0.075 & 0.05 & 0.025\end{bmatrix}$$
given the selected value for inf_to_hosp_dist
.
The corresponding code in cfaepim
:
build_light_obs <- function(inf_to_hosp_dist,
ihr_intercept_prior_mode,
ihr_intercept_prior_scale,
day_of_week_eff_prior_modes,
day_of_week_eff_prior_scales,
holiday_eff_prior_mode,
holiday_eff_prior_scale,
post_holiday_eff_prior_mode,
post_holiday_eff_prior_scale,
non_obs_effect_prior_mode,
non_obs_effect_prior_scale,
inv_dispersion_prior_mode,
inv_dispersion_prior_scale,
link = "logit") {
return(epidemia::epiobs(
formula = as.formula(paste0(
"hosp ~ 1 + day_of_week + ",
"is_holiday + ",
"is_post_holiday + ",
"nonobservation_period"
)),
## Add a covariate for the
## nonobservation window to
## leave an initial evolution
## period with no observations
i2o = inf_to_hosp_dist,
link = link,
family = "neg_binom",
prior_intercept = rstanarm::normal(
location = ihr_intercept_prior_mode,
scale = ihr_intercept_prior_scale
),
prior = rstanarm::normal(
location = c(
day_of_week_eff_prior_modes,
holiday_eff_prior_mode,
post_holiday_eff_prior_mode,
non_obs_effect_prior_mode
),
## a large negative non_obs_effect
## effectively conditions on detection
## prob = 0 outside the observation period
scale = c(
day_of_week_eff_prior_scales,
holiday_eff_prior_scale,
post_holiday_eff_prior_scale,
non_obs_effect_prior_scale
## non-obs prior scale
## should be small to
## enforce non-obs effect
## close to (large negative) mode
)
),
prior_aux = rstanarm::normal(
location = inv_dispersion_prior_mode,
scale = inv_dispersion_prior_scale
)
))
}
@AFg6K7h4fhy2 terminology edits only for the post on the $\mathcal{R}(t)$ model.
Proceeding, each US territory has is own time series of observed influenza hospitalizations $Y = (Y_1, \dotsc, Y_n)$. The observed influenza hospitalizations at time $t$ are given by $$Y_t \sim p(y_t, \phi)$$ where $p(y_t, \phi)$ is the sampling distribution with auxiliary parameter $\phi$. Note that since
reciprocal_dispersion_prior_mode = 10
andreciprocal_dispersion_prior_scale = 5
, we have $\phi \sim \mathcal{N}(10, 5)$. Sincefamily = "neg_binom"
, $\phi$ represents the reciprocal dispersion of the sampling distribution.
Should state explicitly that family = "neg_binom"
means that $p$ becomes a negative binomial parameterized by its mean and reciprocal dispersion. I also might not use $p$ to denote the distribution, since it often denotes a specific probability. Perhaps something like $\sim \mathcal{D}(y_t, \phi)$ or $\sim \mathrm{Dist}(y_t, \phi)$ if your goal is to indicate that the choice of distribution is not fixed but rather determined by the family
argument.
- $\alpha_t>0$ represents an instantaneous ascertainment rate, which in this instance means the rate of influenza infections that become hospitalizations at time $t$
Suggest reported hospitalizations
transformed by the logit-link (since
link = "logit"
) function $g$, where $$g^{-1}(x) = \frac{1}{1 + e^{-x}}$$
Suggest modeled by a linear model with a logit link
Within
cfaepim
, the linear predictor for $\alpha_t$ consists of
linear prediction
We have:
$$ \begin{aligned} \alpha_t &= g^{-1}(\eta_t)\ \etat &= \sum{i=0}^9 \beta_i \end{aligned} $$
Need to include the predictor variables, not just the coefficients:
$$ \begin{aligned} \alpha_t &= g^{-1}(\eta_t)\ \etat &= \sum{i=0}^9 \beta_i x_i \end{aligned} $$
- each of the covariates $\beta_1, \dotsc, \beta_6$ have priors, $\beta_i \sim \mathcal{N}(0, 0.25) \quad i = 1, \dotsc, 6$, corresponding, to the week from Sunday to Friday (where the reference date is Saturday)
The reference date is not necessarily Saturday. It is set in the config by the reference_day_of_week
entry. The day of week values are then encoded relative to this reference day in the build_state_light_model()
function here:
## make sure day_of_week is properly
## set up as a factor
dow_levels <- levels(
lubridate::wday("2023-01-01",
label = TRUE,
week_start = params$reference_day_of_week
)
)
clean_data <- clean_data |>
dplyr::mutate(
day_of_week = factor(day_of_week,
ordered = FALSE,
levels = dow_levels
)
)
@AFg6K7h4fhy2
NOTE: This is currently a work in progress. I've committed a skeleton of the Quarto tutorial but need to continuing finishing the implementation in Python before porting over more and then writing in detail. Consider this draft PR and its initial commit a placeholder.
Recall the goal from the corresponding issue:
And also recall the desired features: