CDCgov / ww-inference-model

An in-development R package and a Bayesian hierarchical model jointly fitting multiple "local" wastewater data streams and "global" case count data to produce nowcasts and forecasts of both observations
https://cdcgov.github.io/ww-inference-model/
Apache License 2.0
16 stars 2 forks source link

Standardize and stabilize outputs of `wwinference()` wrapper function #49

Closed kaitejohnson closed 1 month ago

kaitejohnson commented 2 months ago

Goal

Before a first release (checklist here), we will want to make sure that the main model fitting wrapper function, wwinference() returns a stable set of contents. Right now, we are returning a post-processed draws dataframe + mappings to sites and labs, dates, and maps to the subpopulations. We are also doing a diagnostic check internally within the wrapper function, while also showing the user how to do this from the stan fit object separately in the vignette.

I think it would be best that we don't mix post-processing and model fit at all, and instead write the wwinference() function such that it returns a nested list, ww_fit containing raw_stan_fit_obj, input_data, and stan_data_args. Only the first two will be used in downstream functions, but it might be helpful, similar to the passing around the input data, to pass around the inputs to the stan model (e.g. the priors) for comparing outputs with different priors.

From this, we would overload downstream functions to take in the ww_fit object and generate the draws dataframe, run the diagnostics, plot the results, and (eventually) evaluate the results. The main change from the current workflow right now would be to adjust get_draws_df() to take in the ww_fit object and remove the internal calls to get_draws_df() and get_model_diagnostic_flags()

The goal of all of this would be to 1 modularize the codebase so each function performs one step in the workflow, and 2 put the onus on us to demonstrate the workflow of the package step-by-step and the user to perform each action on the fitted model object.

Requirements

@seabbs @gvegayon @dylanhmorris Curious any other thoughts here

gvegayon commented 2 months ago

Without fully understanding the details, I think the key here is leveraging S3 methods:

  1. If it makes sense, I would reuse existing generic methods like plot and summary to provide post-processing information. You also have the coef, vcov, and confint generics you can use to allow the user to intuitively extract information from the model.
  2. For functions like get_draws_df(), you can always write a generic method so the function may take a fitted model or something else, giving flexibility without much of a hurdle.
  3. I would also try to lean on existing methods from other packages like stan itself.

Edit:

I would also add:

kaitejohnson commented 2 months ago

@gvegayon I am currently writing a bunch of tests of the functions and thinking now would be a good time to do some of this since it will require reworking wwinference() and the post-processing a bit (so get_draws_df() and get_model_diagnostic_flags(). I'm going to start a new PR for this which will include tests for everything from fitting to post-processing. Might tag you if I have questions on how to do the function overloading/creation of the class attribute if that's ok!