Closed kaitejohnson closed 1 month ago
@dylanhmorris This is ready for re-review. I could combine more things into larger wrapper functions, but my preference is that each individual component have its own function. It does still mean that the generate_simulated_data()
function is large and verbose, but I don't think there's a great way to avoid this in a complex model.
@dylanhmorris Let me know if this is good to merge, I made a separate issue for specifying all inputs https://github.com/CDCgov/ww-inference-model/issues/61
@dylanhmorris This addresses #65 and updates the package data accordingly.
@damonbayer This PR adds the true_global_rt
dataframe to the package data. This contains the date and time columns and the unadj_rt
and the realized_rt
.
@dylanhmorris @gvegayon Ready for re-review.
I think it makes sense to make a separate issue to look into the back-calculation of R(t) and see why we are not able to reproduce a flat repeated value each day. I think it might have something to do with indexing/initialization but I'd want to look into it. Should be easier to do with a more functionalized version of the generate_simulated_data()
function.
Ok I think this fixes the back-calculation of R(t), by padding a 0 to the generation interval which replicates the forward
This looks better and gets rid of the weird jump we saw
@dylanhmorris Let me know if there are small changes needed to get this through review. Would be ideal to fix some of this before we have people using it, even if just prototyping. Apologies that it became such a large PR, but I think most of it has been reviewed by now :(
LGTM, thanks @kaitejohnson. Can you make a separate issue to reconsider the R(t) additional noise in the simulated data?
Sure, you mean to consider the noise being generated in log scale?
LGTM, thanks @kaitejohnson. Can you make a separate issue to reconsider the R(t) additional noise in the simulated data?
Sure, you mean to consider the noise being generated in log scale?
But also whether to have a user-specified timeseries with added noise at all (versus either/both of generating the $\mathcal{R}(t)$ from the underlying stochastic process or taking a user-specified timeseries as is (which the user can generate with whatever process they like)
LGTM, thanks @kaitejohnson. Can you make a separate issue to reconsider the R(t) additional noise in the simulated data?
Sure, you mean to consider the noise being generated in log scale?
But also whether to have a user-specified timeseries with added noise at all (versus either/both of generating the R ( t ) from the underlying stochastic process or taking a user-specified timeseries as is (which the user can generate with whatever process they like)
Yeah, I guess the "added noise" could certainly happen outside the function in any simulation analysis.
This PR closes #3 and attempts to make the
generate_simulated_data()
function a series of calls to other functions that represent the critical model components, with the goal being that you can swap in and out different functions for different models (for example how to get the subpopulation R(t) estimates).