CDCgov / wastewater-informed-covid-forecasting

Wastewater-informed COVID-19 forecasting models submitted to the COVID-19 Forecast Hub
https://cdcgov.github.io/wastewater-informed-covid-forecasting/
Apache License 2.0
38 stars 6 forks source link

Implement MVN to estimate a site-site covariance matrix #79

Open kaitejohnson opened 1 week ago

kaitejohnson commented 1 week ago

Problem

The current implementation estimates deviation from the state mean R(t) independently at each time point for each site. It is likely that sites deviations from the state mean are correlated. An MVN can help to estimate this correlation

Context:

Current implementation assumes that, for each site $k$:

logR_{t, k} \sim N(log(\mu_{R,t}), sigma_{R,t})

or if we get rid of the time-varying-sd just:

logR_{t, k} \sim N(log(\mu_{R,t}), sigma_R)

Instead, we could write this as

log( [R_{1,k}... R_{t,k}]) \sim MVN(log([\mu_{R,1}.... \mu_{R,t}]), \Sigma)

Where the difference here is that we are estimating a single covariance matrix, $Sigma$, that describes a constant correlation matrix between sites. This would mean that the correlation between sites would not be independent at each time step, and should in theory make it easier to estimate when we have few samples from one site or delays in reporting on site, since we'd know how that site is correlated to the other sites (which have already reported).

This is similar in vein to what @zsusswein did in the stan model for global variant tracking, and we can look at this repo for inspo if we need non-centered parameterization

zsusswein commented 1 week ago

I link it in the code, but I really just followed this guide in the Stan docs. It's a nice step-by-step explanation.

kaitejohnson commented 1 week ago

@cbernalz This is likely relevant to the spatial correlation stuff (but many options for how you'd want to implement)

cbernalz commented 1 week ago

@kaitejohnson This was my initial idea on how to do it, but I am currently in the process of understanding how your current model does the expected Rt. I think putting together a simulation of your current model should be the last thing before I start adding spatial correlations.

kaitejohnson commented 1 week ago

For sure! Would recommend starting with this https://github.com/CDCgov/wastewater-informed-covid-forecasting/blob/prod/cfaforecastrenewalww/R/generate_simulated_data.R and the vignette! https://github.com/CDCgov/wastewater-informed-covid-forecasting/blob/prod/cfaforecastrenewalww/vignettes/toy_data_vignette.Rmd

seabbs commented 5 days ago

I did this here as well: https://github.com/epiforecasts/forecast.vocs/blob/eb475ff31025f6bb1f34e2e90ce158ce902c6033/inst/stan/twostrainbp.stan#L95

for Rt across two variants. I remember it being slightly tricky to setup the more optimal parameterization but that it ultimately worked fairly well.

I also remember writing it down (https://epiforecasts.io/forecast.vocs/articles/model-definitions.html) really helped.