CDCgov / ww-inference-model

An in-development R package and a Bayesian hierarchical model jointly fitting multiple "local" wastewater data streams and "global" case count data to produce nowcasts and forecasts of both observations
https://cdcgov.github.io/ww-inference-model/
Apache License 2.0
16 stars 2 forks source link

Forward simulation spatially correlated unadjusted reproduction number #25

Closed cbernalz closed 2 months ago

cbernalz commented 3 months ago

Goal

We want to update the generate_simulated_data.R function in the R folder to implement a spatially correlated unadjusted reproduction number into it. We will also add necessary functions to allow different correlation structures to be used.

Context

The current model used in the generate_simulated_data.R function is listed in the model definitions in the repo. The change we are making will be at the unadjusted reproduction level. We will add the spatial correlations here with the following model :

Let $k$ represent a site, from sites $1$ to $n_k$. Let $R_k^u(t)$ be the unadjusted reproduction number at time $t$ at site $k$. So lets look at the combined "state" $R^u(t)$ that contains an AR(1) correlation structure on the first differences that is currently being used. So

$$ \log R^u(t) \sim \text{Normal}(\log R^u(t - 1) + \beta(\log R^u(t - 1) - \log R^u(t - 2)), \sigma_r), $$

or

$$ \log R^u(t) = \log R^u(t - 1) + \beta(\log R^u(t - 1) - \log R^u(t - 2)) + \sigma_r\text{Normal}(0,1). $$

Now let $\delta_k(t)$ be the time-varying subpopulation effect on $\log R^u(t)$. So we can look at the vector of all $\log R_k^u(t)$'s, so

$$ \begin{pmatrix} \log R1^u(t)\ \vdots\ \log R{n_{k}}^u(t)\ \end{pmatrix} = \log R^u(t) \begin{pmatrix} 1\ \vdots\ 1\ \end{pmatrix} + \vec{\delta}(t). $$

Where $\vec{\delta}(t) = \left(\delta1(t),...,\delta{nk}(t)\right)^{T}$. We also have that $\vec{\delta}(t) = \varphi{R(t)}\vec{\delta}(t - 1) + \vec{\epsilon}$, where $\vec{\epsilon}$ $\sim$ $\text{Normal}_{nk}(0,\mathbf{\Sigma\{\epsilon}}).$

This adds a spatial correlation and also the current AR(1) temporal correlation to this $\vec{\delta}(\cdot)$ random variable. Like before, we have that $0 < \varphi_{R(t)} < 1$.

For $\mathbf{\Sigma{\epsilon}}$ $=$ $\sigma{\epsilon}^2\mathbf{\Omega}$, where $\Omega_{ij} \in \mathbf{\Omega}$ are build from the correlation functions. These include an exponential decay, Matern, spherical, rational quadratic, or, to return to the current model framework, independence correlation functions.

For simplicity, we will first assume an adjacency matrix is provided with the data or locations of sites where we can attain an adjacency matrix.

We need to make sure to take into account is the individuals that are not included in a site. So, currently, we will provide them with their own process. This separate process is as follows :

Lets denote the unadjusted reproduction number at time $t$ for individuals who are not represented by the actual sites as $R_{\text{aux}}^u(t)$.

$\log R_{\text{aux}}^u(t) = \log R^u(t) + \delta_{\text{aux}}(t),$

$\delta_{\text{aux}}(t) = \phi_{R(t)}\delta_{\text{aux}}(t - 1) + \epsilon_{\text{aux}},$

$\epsilon_{\text{aux}} \sim \text{Normal}(0,\text{ScalingFactor}\times\sigma_{\epsilon}),$

$\text{ScalingFactor} \sim \text{Log-Normal}(\mu_{\text{ScalingFactor}},\sigma_{\text{ScalingFactor}}^2).$

This process is very similar with the previous way site level Rt was calculated, but here we include the ScalingFactor to scale the variation of the spatial process. Also, if we have an independent structure, we will not have this ScalingFactor and will instead use the $n_k + 1$ approach currently being implemented.

Requirements