Stan spatial functions.

cbernalz commented 1 month ago

Goal

We want to create STAN functions that are similar to the R functions for the spatial components of the model.

Context

The following model will be added as STAN functions : Let $k$ represent a site, from sites $1$ to $n_k$. Let $R_k^u(t)$ be the unadjusted reproduction number at time $t$ at site $k$. So lets look at the combined "state" $R^u(t)$ that contains an AR(1) correlation structure on the first differences that is currently being used. So

$$ \log R^u(t) \sim \text{Normal}(\log R^u(t - 1) + \beta(\log R^u(t - 1) - \log R^u(t - 2)), \sigma_r), $$

or

$$ \log R^u(t) = \log R^u(t - 1) + \beta(\log R^u(t - 1) - \log R^u(t - 2)) + \sigma_r\text{Normal}(0,1). $$

Now let $\delta_k(t)$ be the time-varying subpopulation effect on $\log R^u(t)$. So we can look at the vector of all $\log R_k^u(t)$'s, so

$$ \begin{pmatrix} \log R1^u(t)\ \vdots\ \log R{n_{k}}^u(t)\ \end{pmatrix} = \log R^u(t) \begin{pmatrix} 1\ \vdots\ 1\ \end{pmatrix} + \vec{\delta}(t). $$

Where $\vec{\delta}(t) = \left(\delta1(t),...,\delta{nk}(t)\right)^{T}$. We also have that $\vec{\delta}(t) = \varphi{R(t)}\vec{\delta}(t - 1) + \vec{\epsilon}$, where $\vec{\epsilon}$ $\sim$ $\text{Normal}_{nk}(0,\mathbf{\Sigma\{\epsilon}}).$

This adds a spatial correlation and also the current AR(1) temporal correlation to this $\vec{\delta}(\cdot)$ random variable. Like before, we have that $0 < \varphi_{R(t)} < 1$.

For $\mathbf{\Sigma{\epsilon}}$ $=$ $\sigma{\epsilon}^2\mathbf{\Omega}$, where $\Omega_{ij} \in \mathbf{\Omega}$ are build from the correlation functions. These include an exponential decay, Matern, spherical, rational quadratic, or, to return to the current model framework, independence correlation functions.

For simplicity, we will first assume an distance matrix is provided with the data or locations of sites where we can attain an distance matrix.

We need to make sure to take into account is the individuals that are not included in a site. So, currently, we will provide them with their own process. This separate process is as follows :

Lets denote the unadjusted reproduction number at time $t$ for individuals who are not represented by the actual sites as $R_{\text{aux}}^u(t)$.

$\log R_{\text{aux}}^u(t) = \log R^u(t) + \delta_{\text{aux}}(t),$

$\delta_{\text{aux}}(t) = \phi_{R(t)}\delta_{\text{aux}}(t - 1) + \epsilon_{\text{aux}},$

$\epsilon_{\text{aux}} \sim \text{Normal}(0,\text{ScalingFactor}\times\sigma_{\epsilon}),$ This process is very similar with the previous way site level Rt was calculated, but here we include the ScalingFactor to scale the variation of the spatial process. Also, if we have an independent structure, we will not have this ScalingFactor and will instead use the $n_k + 1$ approach currently being implemented. ## Requirements - [x] Add `spatial_rt_process.stan` - [x] Add `independence_corr_func.stan` - [x] Add `exponential_corr_func.stan` - [x] Add `aux_site_process.stan`

athowes commented 1 month ago

Note that people have implemented spatial models in Stan before (e.g. geostan) / and Stan may have native support for some spatial models.

Also regarding "want to create Stan functions that are similar to the R functions": one thing you can do here is export the Stan functions into R and create unit tests that they are identical to the R functions (to be sure they are identical).

cbernalz commented 1 month ago

@athowes Thank you for the reference, I will take a look at it. As for the unit tests, I did think about that, but one problem I could be wrong about is that one of the functions that I am making requires requires generating from a multivariate normal. Since these are two different languages I think this won't match up even with a seed, but I would be wrong on this.

athowes commented 1 month ago

Which function simulates a multivariate normal? Any of these .stan functions shouldn't be simulating.

(Also if you have two simulators you can take draws from both and compare the distributions as being the same. I've done this using Kolmogorov Smirnov tests before.)

athowes commented 1 month ago

For simplicity, we will first assume an adjacency matrix is provided with the data or locations of sites where we can attain an adjacency matrix.

Are you using an adjacency matrix based approach or one which uses distances between sites passed into a correlation function e.g. Matern? Edit: or it's that you consider spatial-correlation with a distance-based kernel for only adjacent sites?

cbernalz commented 1 month ago

That should be distance matrix, sorry about that. And we would be using any correlation function that takes those distances, currently we only have an exponential decay and on that takes us back to the independence framework.

cbernalz commented 1 month ago

The function that will be simulating the multivariate normal is the spatial_rt_process.R function in this PR : https://github.com/CDCgov/ww-inference-model/pull/26 . We want to make all the new spatial process functions that are in .R into .stan files so we can eventually place them into the full ww-inference model in stan.

kaitejohnson commented 1 month ago

Which function simulates a multivariate normal? Any of these .stan functions shouldn't be simulating.

(Also if you have two simulators you can take draws from both and compare the distributions as being the same. I've done this using Kolmogorov Smirnov tests before.)

@cbernalz I think what @athowes means here (@athowes correct me if I am wrong) is that any of the stan functions should not be doing the equivalent of mvrnorm() which would be something like mvn_normal_rng() in stan. The stan functions should just be deterministic transformations of the input variables.

So I think we will have to think through a bit how to implement something like this, which draws from a MVN-- https://github.com/CDCgov/ww-inference-model/blob/6c8f6c0c4208e150c96007123cc29f3fb5942f1e/R/spatial_rt_process.R#L38.

Most likely we will need a Non-centered parameterization with the MVN, I know there's a way to do this, I don't know it off the top of my head...

kaitejohnson commented 1 month ago

And @athowes definitely agree re unit tests comparing the stan functions to the R functions -- we have a few of these in the current package and the plan was to have more of them here!

athowes commented 1 month ago

Would you be able to write down the model as a GLMM? For example if it were to look like

$$ \text{R}(s, t) = \beta_0 + \text{space}(s) + \text{time}(t) + \text{spacetime}(s, t) $$

then what would the models for $\text{space}(s)$, $\text{time}(t)$, and $\text{spacetime}(s, t)$ be? For example, it seems like $\text{time}(t) \sim \text{AR}(1)$.

Also as a possible option, if you have the model implemented in Stan and use flags for likelihood then I think you could simulate data from the prior (using HMC). This would allow you to have one source of this code (in Stan) rather than both the R and the Stan code that you validate are the same. That said, it's easier to work with R code.

seabbs commented 1 month ago

I like the decomposition idea and the suggestion to look at the (quite a few) places people have implemented spatial models in stan before

cbernalz commented 1 month ago

This is how I believe we can write this model as a GLMM, please let me know your opinions or if it should look different, @athowes and @seabbs :

$$ R(t,k) = R^{\text{state}}(t) + \phi_{R(t)}\delta(t - 1) + \epsilon(k) $$

Here I would think that $R^{\text{state}}(t)$ acts as the $\beta0$, $\phi{R(t)}\delta(t - 1)$ acts as the $\text{time}(t)$, and $\epsilon(k)$ would be the $\text{space}(k)$. The only thing we would probably need to add would be an error term with constant variance and no correlation centered at zero to be the $\text{spacetime}(t,k)$. My confusion here would lie with $R^{\text{state}}(t)$ being the $\beta_0$ because it depends on time, would this term go into the $\text{time}(t)$ and then we have $\beta_0 = 0$?

athowes commented 1 month ago

You're looking to model one individual state (rather than the whole country)?
What's $\phi_{R(t)}$? Sorry if I'm missing it above
$\delta(t)$ is a vector comprised of individual $\delta_k(t)$ terms? So somehow it looks like this middle term corresponds to $\text{spacetime}(t,k)$ as it depends on both $t$ and $k$? So my reading is that there is a state-level temporal trend, site-level spatial term, and then some kind of site-time interaction term (but so far it doesn't seem to be being framed as such)
- If it is a site-time time interaction, what is the model on it? Could you specify it in terms of Knorr-Held Type I, II, III, IV
What model do you have on $\epsilon(k)$?
About the "My confusion here would lie with" -- yes I think this is fine, I'd imagine that the model $R^{\text{state}}(t)$ might be some kind of time series model already that includes an intercept rather than being centered at zero say

Also a note that could be useful: something like the $\text{AR}(1)$ model (or other autoregressive models) has a difference formulation, but it also has a joint formulation. You can write the joint covariance (or precision) matrix of the resulting structured multivariate normal. Could you get to something similar for the term $\phi_{R(t)}\delta(t - 1)$?

kaitejohnson commented 1 month ago

You're looking to model one individual state (rather than the whole country)?

Yeah for now, some states have as many as 140 wastewater sites in them so we are just doing within-state fits. Eventually would be great to add another layer of hierarchy and jointly estimate across states but that is not within scope here.

What's ϕR(t)? Sorry if I'm missing it above

$\phi{R(t)}$ is the temporal autoregulation coefficient I believe this was written as $\psi{R(t)}$ originally but correct me if I am wrong @cbernalz

δ(t) is a vector comprised of individual δk(t) terms? So somehow it looks like this middle term corresponds to spacetime(t,k) as it depends on both t and k? So my reading is that there is a state-level temporal trend, site-level spatial term, and then some kind of site-time interaction term (but so far it doesn't seem to be being framed as such)

I think as written, there is a state-level temporal trend, a site-level temporal trend (AR(1) process in how much deviation from the state-level trend) and a site-level spatial trend ($\epsilon_k$). I don't think there is an interaction term, but I could be misunderstanding.

If it is a site-time time interaction, what is the model on it? Could you specify it in terms of Knorr-Held Type I, II, III, IV

What model do you have on ϵ(k)?

So the idea here is we're just replacing completely unspatially correlated "error" with spatially correlated error via a draw from a MVN with a spatially informed covariance matrix, compared to the current implementation which is just a draw from a vector of standard normals.

Also a note that could be useful: something like the AR(1) model (or other autoregressive models) has a difference formulation, but it also has a joint formulation. You can write the joint covariance (or precision) matrix of the resulting structured multivariate normal. Could you get to something similar for the term ϕR(t)δ(t−1)?

Would this imply that there is a correlation structure between time points?

athowes commented 1 month ago

some states have as many as 140 wastewater sites in them

Do you have coordinates for the different sites, and using those as the spatial location, or do you have areas for the sites? If you have coordinates, then how are you moving from this to a map (i.e. generalising outside the sites).

I don't think there is an interaction term, but I could be misunderstanding.

On call @dylanhmorris suggested that there was some kind of spatio-temporal interaction here, so would be nice to get that ironed out.

kaitejohnson commented 1 month ago

Closing this issue since the tasks described are done! Happy to continue discussion here to elsewhere

CDCgov / ww-inference-model

Stan spatial functions. #28

Goal

Context