CDCgov / PyRenew

Python package for multi-signal Bayesian renewal modeling with JAX and NumPyro.
https://cdcgov.github.io/PyRenew/
Apache License 2.0
14 stars 2 forks source link

Replicate Model 2 from cdcgov/wastewater-informed-covid-forecasting #3

Closed ghost closed 6 months ago

ghost commented 7 months ago

Goal

Have a working Python library that implements the most basic version of the wastewater model. By most basic, we mean (a) hospitalizations only, and (b) single geographical unit (no pooled model).

Context

See #32.

Required features

Specifications

Out of scope

Features beyond the basic model.

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/44 author: @gvegayoncdc

ghost commented 7 months ago

@afmagee42 and @kaitejohnson, could you point me to the data + estimates you use for CI in the WW model 👼?

afmagee42 commented 7 months ago

For our CI on the model, our testing inputs and expected outputs are stored in an .rda file. These are produced in internal_data.R, using package-public data. The internal testing data are stan data objects and single stan output iterations from short runs.

The public example data is generated in example_data.R and live in the package's data folder. These are simulations of semi-processed data (post-pull, pre stan-preprocessing).

(Yes this is a bit messy, blame me and my insistence that users should not have access to purely testing data in the package, because that's why it's this way.)

Pipeline testing/CI is handled at the repo level and currently just checks that the pipeline runs without errors.

afmagee42 commented 7 months ago

We specifically don't use any real data or estimates in order to make sure there's no potential data leak, as this lives in the public repo.

Our first-pass at CI data for checking the stan output was consistent was taken from a posterior-predictive dataset from a real model run.

The testing would have also included (or perhaps just been) the computation of the joint posterior density at some arbitrary parameter values, but cmdstanr won't expose the requisite functions in WSL so we had to evaluate based on MCMC output instead. (I don't like that this compounds both changes in the model and any changes stan makes to the algorithm in one place, but it's what we've got.)

gvegayon commented 6 months ago

PR #55 is completing the testing part.