CDCgov / ww-inference-model

An in-development R package and a Bayesian hierarchical model jointly fitting multiple "local" wastewater data streams and "global" case count data to produce nowcasts and forecasts of both observations
https://cdcgov.github.io/ww-inference-model/
Apache License 2.0
16 stars 2 forks source link

2024-08-08 update : adding spatial to wwinference model. #56

Closed cbernalz closed 1 month ago

cbernalz commented 2 months ago

This PR is regarding the following issue : https://github.com/CDCgov/ww-inference-model/issues/44#issue-2445019536 . The model that will be added into this will be coming from this issue : https://github.com/CDCgov/ww-inference-model/issues/25#issue-2409527383 .

cbernalz commented 2 months ago

This is not ready yet, but I wanted to take input from both of you : @dylanhmorris and @kaitejohnson . Spatial components are in the model and using the testfile, I ran everything and things definitely change. There is a lot more variability, mainly in the forecast period. Rt take values that reach past 2, and site Rt's occasionally reach past 5.

kaitejohnson commented 2 months ago

This is not ready yet, but I wanted to take input from both of you : @dylanhmorris and @kaitejohnson . Spatial components are in the model and using the testfile, I ran everything and things definitely change. There is a lot more variability, mainly in the forecast period. Rt take values that reach past 2, and site Rt's occasionally reach past 5.

Interesting! How does that compare to the ground truth R(t) values in each subpopulation? Might be worth making a plot of the estimated R(t)s and the known R(t)s that you simulated.

Also, how tight are your priors around the ground truth values of $\phi$ and $\sigma_{\epsilon}$? To start, we might want to force those to be pretty tightly specified just to see if that allows us to reproduce the ground truth results (so cheating a little, but enforcing identifiability).

cbernalz commented 2 months ago

This is not ready yet, but I wanted to take input from both of you : @dylanhmorris and @kaitejohnson . Spatial components are in the model and using the testfile, I ran everything and things definitely change. There is a lot more variability, mainly in the forecast period. Rt take values that reach past 2, and site Rt's occasionally reach past 5.

Interesting! How does that compare to the ground truth R(t) values in each subpopulation? Might be worth making a plot of the estimated R(t)s and the known R(t)s that you simulated.

Also, how tight are your priors around the ground truth values of ϕ and σ ϵ ? To start, we might want to force those to be pretty tightly specified just to see if that allows us to reproduce the ground truth results (so cheating a little, but enforcing identifiability).

Yes, I worked on changing these values and how big they are, but it seems like the thing that makes these values explode is using sigma_generalized, or the generalized variance formulation, from the normalization of the omega matrix. When I removed this part from the model and did the traditional decomposition of $\Sigma$, I saw from the site Rt plots values are more realistic to what I believe the actual ones are.

I could have written the generalized variance into stan wrong, it might be good to look over that particular part @kaitejohnson and @dylanhmorris on lines 236 in the wwinference.stan model.

dylanhmorris commented 2 months ago

@cbernalz can you make some plots and share them in the other repo?

kaitejohnson commented 2 months ago

@cbernalz can you make some plots and share them in the other repo?

You could share them here because it only simulated data?

cbernalz commented 2 months ago

@dylanhmorris and @kaitejohnson I am currently adjusting the generate_simulation_data function to use the generalized variance decomposition, I think this may have been the problem with the fitting. I wanted to ask if we have the site Rt's with a generalized variance as $c$, what variance would I use for the aux site process. Would I use this same $c$, so aux site deviations would have a $\text{Normal}(0,\text{scalingfactor}\cdot c)$?

kaitejohnson commented 2 months ago

@dylanhmorris and @kaitejohnson I am currently adjusting the generate_simulation_data function to use the generalized variance decomposition, I think this may have been the problem with the fitting. I wanted to ask if we have the site Rt's with a generalized variance as c , what variance would I use for the aux site process. Would I use this same c , so aux site deviations would have a Normal ( 0 , scalingfactor ⋅ c ) ?

Yep, I think that is the idea! So if scalingfactor =1 on average each the auxiliary site subpopulation would have the same amount of deviation from the state R(t) as the (averaged across all sites) deviation from the state R(t) in the sites. But @dylanhmorris can correct me if this interpretation is wrong

cbernalz commented 2 months ago

Great, @kaitejohnson and @dylanhmorris I figured out the problem was the strict prior placed on sigma_generalized. The plots are on the ent repo issue.

kaitejohnson commented 1 month ago

Weirdly I am not able to see the full set of CI jobs (including the failure) for the check-package

cbernalz commented 1 month ago

@dylanhmorris and @kaitejohnson I updated all of your comments and the biggest change is that I added a way to use either an independent correlation function (default if no dist_matrix is provided) and the exponential decay correlation function (if dist_matrix is provided). Let me know what should be changed!