CDCgov / multisignal-epi-inference

Python package for statistical inference and forecast of epi models using multiple signals
https://cdcgov.github.io/multisignal-epi-inference/
10 stars 1 forks source link

add arviz plot for posterior predictive #234

Open sbidari opened 3 days ago

sbidari commented 3 days ago

Add 'hdi_type' plot with equal tailed credible intervals to demonstrate posterior predictive distribution using arviz

(second try after closing previous one due to mangled commit history)

codecov[bot] commented 3 days ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (8291c02) to head (f13c04d).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #234 +/- ## =========================================== + Coverage 92.59% 100.00% +7.40% =========================================== Files 40 2 -38 Lines 891 7 -884 =========================================== - Hits 825 7 -818 + Misses 66 0 -66 ``` | [Flag](https://app.codecov.io/gh/CDCgov/multisignal-epi-inference/pull/234/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=CDCgov) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/CDCgov/multisignal-epi-inference/pull/234/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=CDCgov) | `100.00% <ø> (+7.40%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=CDCgov#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sbidari commented 2 days ago

Did you try anything with plot_lm, plot_ppc, or plot_ts?

I couldn't get the 'timeseries' type plots (plot_lm and plot_ts) to work for the hospital admissions model. I suspect it has something to do with the padding as those plots work for the basic model implementation. The output from plot_lm would be similar to figure generated here.

I was not sure if we wanted multiple posterior predictive plots. I can include figures using plot_ppc if you'd like. @damonbayer

damonbayer commented 2 days ago

I couldn't get the 'timeseries' type plots (plot_lm and plot_ts) to work for the hospital admissions model. I suspect it has something to do with the padding as those plots work for the basic model implementation.

Can you demonstrate those plots working somewhere? In this tutorial or a different one.

I was not sure of we wanted multiple posterior predictive plots. I can include figures using plot_ppc if you'd like. @damonbayer

Yes, please.

sbidari commented 2 days ago

I couldn't get the 'timeseries' type plots (plot_lm and plot_ts) to work for the hospital admissions model. I suspect it has something to do with the padding as those plots work for the basic model implementation.

Can you demonstrate those plots working somewhere? In this tutorial or a different one.

Ok! I will add demonstration of these to getting started tutorial.

sbidari commented 1 day ago

I do not understand why the posterior predictive distribution for the hospital admissions model does not have the same dimension as the observation. hosp2

Am I missing something here? @damonbayer

damonbayer commented 1 day ago

Am I missing something here? @damonbayer

It is because of the padding and/or seeding. I would try padding in the data in InferenceData with missing values for now. If that doesn't work, we can change the model a bit or manually modify the samples.

All of this will be handled better once we have time incorporated throughout the project.

sbidari commented 1 day ago

That's what I thought too but this is for the model fit without padding shown below:

hosp_model.run(
    num_samples=1000,
    num_warmup=1000,
    data_observed_hosp_admissions=dat["daily_hosp_admits"].to_numpy(),
    rng_key=jax.random.PRNGKey(54),
    mcmc_args=dict(progress_bar=False,num_chains=1),
)

Since we are not padding the observation data here, padding shouldn't come into play, right?

damonbayer commented 1 day ago

Since we are not padding the observation data here, padding shouldn't come into play, right?

It's just because of the seeding, then.

damonbayer commented 1 day ago

I do not understand why the posterior predictive distribution for the hospital admissions model does not have the same dimension as the observation. hosp2

Am I missing something here? @damonbayer

Try with the argument supplying the argument data_observed_hosp_admissions instead of n_timepoints_to_simulate

I think it would be hosp_model.posterior_predictive(data_observed_hosp_admissions=dat["daily_hosp_admits"].to_numpy().astype(float).

You could save that as its own variable somewhere, since we have now used it twice.

sbidari commented 1 day ago

Try with the argument supplying the argument data_observed_hosp_admissions instead of n_timepoints_to_simulate

I think it would be hosp_model.posterior_predictive(data_observed_hosp_admissions=dat["daily_hosp_admits"].to_numpy().astype(float).

This solves the dimension issue but now the posterior predictive samples are just observed data.

Capture

damonbayer commented 1 day ago

This solves the dimension issue but now the posterior predictive samples are just observed data.

Oops. We will need to change the model code.

sbidari commented 1 day ago

I noticed for both prior_predictive and posterior_predictive if called using data_observed_hosp_admissions=dat["daily_hosp_admits"].to_numpy().astype(float) argument then the predictive distributions returned is simply the observed data.