CDCgov / ww-inference-model

An in-development R package and a Bayesian hierarchical model jointly fitting multiple "local" wastewater data streams and "global" case count data to produce nowcasts and forecasts of both observations
https://cdcgov.github.io/ww-inference-model/
Apache License 2.0
17 stars 2 forks source link

Refactor hierarchical site-level estimation to estimate `n_subpops-1` deviations from the "reference" subpopulation #149

Closed kaitejohnson closed 1 month ago

kaitejohnson commented 2 months ago

Edited to reflect additional intercept term and propose optional patchwork:

Goal

This should address #136, but also a more fundamental issue brought up by @damonbayer and @sbidari, which is that the current version of the "hospital admissions only" model is still going to estimate n_subpops infection dynamics, even though this will not be informed by wastewater concentration data and should not have a patchwork or hierarchical population structure. Instead, the desired behavior here is that:

This results in the following changes to the model definition:

Currently, we estimate a global undamped effective reproductive number $\mathcal{R}^\mathrm{u}(t)$. We now instead will estimate a reference effective reproductive number $\mathcal{R}^0(t)$ with $K_{\mathrm{subpops}}-1$ deviations from the reference.

The number of subpopulations falls under a few distinct cases:

Thus we have the following proposed rewrite to the "Subpopulation level infections":

Subpopulation-level infections

We couple the subpopulation and total population infection dynamics at the level of the un-damped instantaneous reproduction number $\mathcal{R}^\mathrm{u}_ {0}(t)$.

We model the subpopulations as having infection dynamics that are similar to one another but can differ from the reference dynamic.

We represent this with a hierarchical model where we estimate a reference un-damped effective reproductive number $\mathcal{R}^\mathrm{u} {0}(t)$ and then estimate the individual subpopulation $k$ deviations from the reference value, $\mathcal{R}^{\mathrm{u}}{k}(t)$

The reference value for the undamped instantaneous reproductive number $\mathcal{R}^\mathrm{u}_0(t)$ follows the time-evolution described above. Subpopulation deviations from the reference reproduction number are modeled via a log-scale AR(1) process. Specifically, for subpopulation $k$:

$$ \log[\mathcal{R}^\mathrm{u}_{k}(t)] = \log[\mathcal{R}^\mathrm{u}_0(t)] + m +\delta_k(t) $$

where $m$ is an "intercept" for the reference subpopulation, which is a fixed inferred parameter and allows for the fact that $\log[\mathcal{R}^\mathrm{u}_0(t)]$ may be the reference value but doesn't have to be the central value.

$\deltak(t)$ is the time-varying subpopulation effect on $\mathcal{R}^\mathrm{u} 0(t)$, modeled as,

$$\deltak(t) = \varphi{R(t)} \deltak(t-1) + \epsilon{kt}$$

where $0 < \varphi{R(t)} < 1$ and $\epsilon{kt} \sim \mathrm{Normal}(0, \sigma_{R(t)\delta})$.

We chose a prior of $\varphi{R(t)} \sim \mathrm{beta}(2,40)$ to impose limited autocorrelation in the week-by-week deviations. We set a weakly informative prior $\sigma{R(t)\delta} \sim \mathrm{Normal}(0, 0.3)$ to allow for either limited or substantial site-site heterogeneity in $\mathcal{R}^\mathrm{u}_ 0(t)$, with the degree of heterogeneity inferred from the data.

@dylanhmorris let me know if this reflects accurately our conversation, and if others agree with this approach.

@gvegayon @SamuelBrand1 @seabbs would love your thoughts as well

dylanhmorris commented 2 months ago

Looks largely good, but I think it can be simplified by writing more in terms of subpopulations (which may or may not have observed wastewater) and less in terms of wastewater sites.

Currently, we estimate a global undamped effective reproductive number $\mathcal{R}^\mathrm{u}(t)$. We now instead will estimate a single reference effective reproductive number $\mathcal{R}^0(t)$ with $K{\mathrm{subpops}}-1$ deviations from the reference in the case of the wastewater informed model, where $K{\mathrm{subpops}} = n{\mathrm{sites}} +1$ if $\sum\nolimits{k=1}^{K_\mathrm{sites}} nk < n$ and $K{\mathrm{subpops}} = n{\mathrm{sites}}$ if $\sum\nolimits{k=1}^{K_\mathrm{sites}} n_k > n$ or if there are no sites (in which case, no deviations are estimated).

Also, note that "if there are no sites (in which case, no deviations are estimated)" is a case of $K{\mathrm{subpops}} = n{\mathrm{sites}} +1$ if $\sum\nolimits{k=1}^{K\mathrm{sites}} nk < n$ with $K{\mathrm{subpops}} = 0 + 1 = 1$, so this sentence needs revision regardless:

where $K{\mathrm{subpops}} = n{\mathrm{sites}} +1$ if $\sum\nolimits{k=1}^{K\mathrm{sites}} nk < n$ and $K{\mathrm{subpops}} = n{\mathrm{sites}}$ if $\sum\nolimits{k=1}^{K_\mathrm{sites}} n_k > n$ or if there are no sites (in which case, no deviations are estimated).

dylanhmorris commented 2 months ago

Also, I thought we had discussed additionally inferring an intercept for the "reference" subpopulation $\mathcal{R}(t)$:

$$ \log[\mathcal{R}^\mathrm{u}_{k}(t)] = \log[\mathcal{R}^\mathrm{u}_0(t)] + m + \delta_k(t) $$

Where $m$ is fixed inferred parameter and allows for the fact that $\log[\mathcal{R}^\mathrm{u}_0(t)]$ may be the reference value but doesn't have to be the central value (except in the special case of 1 single (sub)population

seabbs commented 2 months ago

ation data and should not have a patchwork or hierarchical population structure.

I think this should be could not and not should not. There are lots of reasons one might want a patch based outbreak regardless of your possession of ww data.

kaitejohnson commented 2 months ago

Also, I thought we had discussed additionally inferring an intercept for the "reference" subpopulation R ( t ) :

log ⁡ [ R k u ( t ) ] = log ⁡ [ R 0 u ( t ) ] + m + δ k ( t )

Where m is fixed inferred parameter and allows for the fact that log ⁡ [ R 0 u ( t ) ] may be the reference value but doesn't have to be the central value (except in the special case of 1 single (sub)population

Whoops yes will edit and add this! Also updated to clarify based on number of subpopulations

kaitejohnson commented 2 months ago

ation data and should not have a patchwork or hierarchical population structure.

I think this should be could not and not should not. There are lots of reasons one might want a patch based outbreak regardless of your possession of ww data.

Would your thought be that the user can specify the number of patches in the absence of "patched" data @seabbs ?

dylanhmorris commented 2 months ago

Would your thought be that the user can specify the number of patches in the absence of "patched" data @seabbs ?

One might have other subpopulation-based observables (e.g. subpopulation level admissions data) or aggregate-level ww. Supporting this in the package is beyond the scope of this particular change/feature, but relevant to how we think/write up about the model.