Closed kaitejohnson closed 1 month ago
Looks largely good, but I think it can be simplified by writing more in terms of subpopulations (which may or may not have observed wastewater) and less in terms of wastewater sites.
Currently, we estimate a global undamped effective reproductive number $\mathcal{R}^\mathrm{u}(t)$. We now instead will estimate a single reference effective reproductive number $\mathcal{R}^0(t)$ with $K{\mathrm{subpops}}-1$ deviations from the reference in the case of the wastewater informed model, where $K{\mathrm{subpops}} = n{\mathrm{sites}} +1$ if $\sum\nolimits{k=1}^{K_\mathrm{sites}} nk < n$ and $K{\mathrm{subpops}} = n{\mathrm{sites}}$ if $\sum\nolimits{k=1}^{K_\mathrm{sites}} n_k > n$ or if there are no sites (in which case, no deviations are estimated).
Also, note that "if there are no sites (in which case, no deviations are estimated)" is a case of $K{\mathrm{subpops}} = n{\mathrm{sites}} +1$ if $\sum\nolimits{k=1}^{K\mathrm{sites}} nk < n$ with $K{\mathrm{subpops}} = 0 + 1 = 1$, so this sentence needs revision regardless:
where $K{\mathrm{subpops}} = n{\mathrm{sites}} +1$ if $\sum\nolimits{k=1}^{K\mathrm{sites}} nk < n$ and $K{\mathrm{subpops}} = n{\mathrm{sites}}$ if $\sum\nolimits{k=1}^{K_\mathrm{sites}} n_k > n$ or if there are no sites (in which case, no deviations are estimated).
Also, I thought we had discussed additionally inferring an intercept for the "reference" subpopulation $\mathcal{R}(t)$:
$$ \log[\mathcal{R}^\mathrm{u}_{k}(t)] = \log[\mathcal{R}^\mathrm{u}_0(t)] + m + \delta_k(t) $$
Where $m$ is fixed inferred parameter and allows for the fact that $\log[\mathcal{R}^\mathrm{u}_0(t)]$ may be the reference value but doesn't have to be the central value (except in the special case of 1 single (sub)population
ation data and should not have a patchwork or hierarchical population structure.
I think this should be could not and not should not. There are lots of reasons one might want a patch based outbreak regardless of your possession of ww data.
Also, I thought we had discussed additionally inferring an intercept for the "reference" subpopulation R ( t ) :
log [ R k u ( t ) ] = log [ R 0 u ( t ) ] + m + δ k ( t )
Where m is fixed inferred parameter and allows for the fact that log [ R 0 u ( t ) ] may be the reference value but doesn't have to be the central value (except in the special case of 1 single (sub)population
Whoops yes will edit and add this! Also updated to clarify based on number of subpopulations
ation data and should not have a patchwork or hierarchical population structure.
I think this should be could not and not should not. There are lots of reasons one might want a patch based outbreak regardless of your possession of ww data.
Would your thought be that the user can specify the number of patches in the absence of "patched" data @seabbs ?
Would your thought be that the user can specify the number of patches in the absence of "patched" data @seabbs ?
One might have other subpopulation-based observables (e.g. subpopulation level admissions data) or aggregate-level ww. Supporting this in the package is beyond the scope of this particular change/feature, but relevant to how we think/write up about the model.
Edited to reflect additional intercept term and propose optional patchwork:
Goal
This should address #136, but also a more fundamental issue brought up by @damonbayer and @sbidari, which is that the current version of the "hospital admissions only" model is still going to estimate
n_subpops
infection dynamics, even though this will not be informed by wastewater concentration data and should not have a patchwork or hierarchical population structure. Instead, the desired behavior here is that:renewal_ww_hosp.stan
model in the production/eval repo. Here in effect there will be $n_{\mathrm{sites}} = 0$, and the population of the "auxiliary site of those not captured by wastewater" is the total population, $n$n_subpops = n_sites
R(t) estimates, whose per capita infections generate expected counts in the total pop. Here there will be no "auxiliary site", and the "reference" subpopulation will just be the largest wastewater catchment area.This results in the following changes to the model definition:
Currently, we estimate a global undamped effective reproductive number $\mathcal{R}^\mathrm{u}(t)$. We now instead will estimate a reference effective reproductive number $\mathcal{R}^0(t)$ with $K_{\mathrm{subpops}}-1$ deviations from the reference.
The number of subpopulations falls under a few distinct cases:
Thus we have the following proposed rewrite to the "Subpopulation level infections":
Subpopulation-level infections
We couple the subpopulation and total population infection dynamics at the level of the un-damped instantaneous reproduction number $\mathcal{R}^\mathrm{u}_ {0}(t)$.
We model the subpopulations as having infection dynamics that are similar to one another but can differ from the reference dynamic.
We represent this with a hierarchical model where we estimate a reference un-damped effective reproductive number $\mathcal{R}^\mathrm{u} {0}(t)$ and then estimate the individual subpopulation $k$ deviations from the reference value, $\mathcal{R}^{\mathrm{u}}{k}(t)$
The reference value for the undamped instantaneous reproductive number $\mathcal{R}^\mathrm{u}_0(t)$ follows the time-evolution described above. Subpopulation deviations from the reference reproduction number are modeled via a log-scale AR(1) process. Specifically, for subpopulation $k$:
$$ \log[\mathcal{R}^\mathrm{u}_{k}(t)] = \log[\mathcal{R}^\mathrm{u}_0(t)] + m +\delta_k(t) $$
where $m$ is an "intercept" for the reference subpopulation, which is a fixed inferred parameter and allows for the fact that $\log[\mathcal{R}^\mathrm{u}_0(t)]$ may be the reference value but doesn't have to be the central value.
$\deltak(t)$ is the time-varying subpopulation effect on $\mathcal{R}^\mathrm{u} 0(t)$, modeled as,
$$\deltak(t) = \varphi{R(t)} \deltak(t-1) + \epsilon{kt}$$
where $0 < \varphi{R(t)} < 1$ and $\epsilon{kt} \sim \mathrm{Normal}(0, \sigma_{R(t)\delta})$.
We chose a prior of $\varphi{R(t)} \sim \mathrm{beta}(2,40)$ to impose limited autocorrelation in the week-by-week deviations. We set a weakly informative prior $\sigma{R(t)\delta} \sim \mathrm{Normal}(0, 0.3)$ to allow for either limited or substantial site-site heterogeneity in $\mathcal{R}^\mathrm{u}_ 0(t)$, with the degree of heterogeneity inferred from the data.
@dylanhmorris let me know if this reflects accurately our conversation, and if others agree with this approach.
@gvegayon @SamuelBrand1 @seabbs would love your thoughts as well