Closed ghost closed 6 months ago
@afmagee42 and @kaitejohnson, could you point me to the data + estimates you use for CI in the WW model 👼?
For our CI on the model, our testing inputs and expected outputs are stored in an .rda
file. These are produced in internal_data.R
, using package-public data. The internal testing data are stan data objects and single stan output iterations from short runs.
The public example data is generated in example_data.R
and live in the package's data
folder. These are simulations of semi-processed data (post-pull, pre stan-preprocessing).
(Yes this is a bit messy, blame me and my insistence that users should not have access to purely testing data in the package, because that's why it's this way.)
Pipeline testing/CI is handled at the repo level and currently just checks that the pipeline runs without errors.
We specifically don't use any real data or estimates in order to make sure there's no potential data leak, as this lives in the public repo.
Our first-pass at CI data for checking the stan output was consistent was taken from a posterior-predictive dataset from a real model run.
The testing would have also included (or perhaps just been) the computation of the joint posterior density at some arbitrary parameter values, but cmdstanr won't expose the requisite functions in WSL so we had to evaluate based on MCMC output instead. (I don't like that this compounds both changes in the model and any changes stan makes to the algorithm in one place, but it's what we've got.)
PR #55 is completing the testing part.
Goal
Have a working Python library that implements the most basic version of the wastewater model. By most basic, we mean (a) hospitalizations only, and (b) single geographical unit (no pooled model).
Context
See #32.
Required features
Specifications
model/src
implementing the model (like this)Out of scope
Features beyond the basic model.
Related documents
Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/44 author: @gvegayoncdc