CDCgov / ww-inference-model

An in-development R package and a Bayesian hierarchical model jointly fitting multiple "local" wastewater data streams and "global" case count data to produce nowcasts and forecasts of both observations
https://cdcgov.github.io/ww-inference-model/
Apache License 2.0
16 stars 2 forks source link

Modularize the functions in the `generate_simulated_data()` function #29

Closed kaitejohnson closed 1 month ago

kaitejohnson commented 3 months ago

This PR closes #3 and attempts to make the generate_simulated_data() function a series of calls to other functions that represent the critical model components, with the goal being that you can swap in and out different functions for different models (for example how to get the subpopulation R(t) estimates).

kaitejohnson commented 2 months ago

@dylanhmorris This is ready for re-review. I could combine more things into larger wrapper functions, but my preference is that each individual component have its own function. It does still mean that the generate_simulated_data() function is large and verbose, but I don't think there's a great way to avoid this in a complex model.

kaitejohnson commented 2 months ago

@dylanhmorris Let me know if this is good to merge, I made a separate issue for specifying all inputs https://github.com/CDCgov/ww-inference-model/issues/61

kaitejohnson commented 2 months ago

@dylanhmorris This addresses #65 and updates the package data accordingly.

kaitejohnson commented 2 months ago

@damonbayer This PR adds the true_global_rt dataframe to the package data. This contains the date and time columns and the unadj_rt and the realized_rt.

kaitejohnson commented 2 months ago

@dylanhmorris @gvegayon Ready for re-review.

I think it makes sense to make a separate issue to look into the back-calculation of R(t) and see why we are not able to reproduce a flat repeated value each day. I think it might have something to do with indexing/initialization but I'd want to look into it. Should be easier to do with a more functionalized version of the generate_simulated_data() function.

kaitejohnson commented 2 months ago

Ok I think this fixes the back-calculation of R(t), by padding a 0 to the generation interval which replicates the forward image

This looks better and gets rid of the weird jump we saw

kaitejohnson commented 1 month ago

@dylanhmorris Let me know if there are small changes needed to get this through review. Would be ideal to fix some of this before we have people using it, even if just prototyping. Apologies that it became such a large PR, but I think most of it has been reviewed by now :(

kaitejohnson commented 1 month ago

LGTM, thanks @kaitejohnson. Can you make a separate issue to reconsider the R(t) additional noise in the simulated data?

Sure, you mean to consider the noise being generated in log scale?

dylanhmorris commented 1 month ago

LGTM, thanks @kaitejohnson. Can you make a separate issue to reconsider the R(t) additional noise in the simulated data?

Sure, you mean to consider the noise being generated in log scale?

But also whether to have a user-specified timeseries with added noise at all (versus either/both of generating the $\mathcal{R}(t)$ from the underlying stochastic process or taking a user-specified timeseries as is (which the user can generate with whatever process they like)

kaitejohnson commented 1 month ago

LGTM, thanks @kaitejohnson. Can you make a separate issue to reconsider the R(t) additional noise in the simulated data?

Sure, you mean to consider the noise being generated in log scale?

But also whether to have a user-specified timeseries with added noise at all (versus either/both of generating the R ( t ) from the underlying stochastic process or taking a user-specified timeseries as is (which the user can generate with whatever process they like)

Yeah, I guess the "added noise" could certainly happen outside the function in any simulation analysis.