Standardize and stabilize outputs of `wwinference()` wrapper function

kaitejohnson commented 3 months ago

Goal

Before a first release (checklist here), we will want to make sure that the main model fitting wrapper function, wwinference() returns a stable set of contents. Right now, we are returning a post-processed draws dataframe + mappings to sites and labs, dates, and maps to the subpopulations. We are also doing a diagnostic check internally within the wrapper function, while also showing the user how to do this from the stan fit object separately in the vignette.

I think it would be best that we don't mix post-processing and model fit at all, and instead write the wwinference() function such that it returns a nested list, ww_fit containing raw_stan_fit_obj, input_data, and stan_data_args. Only the first two will be used in downstream functions, but it might be helpful, similar to the passing around the input data, to pass around the inputs to the stan model (e.g. the priors) for comparing outputs with different priors.

From this, we would overload downstream functions to take in the ww_fit object and generate the draws dataframe, run the diagnostics, plot the results, and (eventually) evaluate the results. The main change from the current workflow right now would be to adjust get_draws_df() to take in the ww_fit object and remove the internal calls to get_draws_df() and get_model_diagnostic_flags()

The goal of all of this would be to 1 modularize the codebase so each function performs one step in the workflow, and 2 put the onus on us to demonstrate the workflow of the package step-by-step and the user to perform each action on the fitted model object.

Requirements

[ ] output the following elements from wwinference(): raw_stan_fit_obj, input_data and stan_data_args. All will be lists. Optionally, we could separate the ww_data and hosp_data and pass back both as dataframes
[ ] rewrite get_draws_df() (perhaps rename to extract_posterior_draws()) to take in the ww_fit object as an output and return something that looks like draws_df() currently. Allow user to pass in args to only include certain variables among a set of allowed variables.
[ ] revise vignette to demonstrate each of these steps
[ ] add a mermaid diagram describing how the functions interact
[ ] all revised functions (fitting wrapper functions and downstream functions) should be tested

@seabbs @gvegayon @dylanhmorris Curious any other thoughts here

gvegayon commented 3 months ago

Without fully understanding the details, I think the key here is leveraging S3 methods:

If it makes sense, I would reuse existing generic methods like plot and summary to provide post-processing information. You also have the coef, vcov, and confint generics you can use to allow the user to intuitively extract information from the model.
For functions like get_draws_df(), you can always write a generic method so the function may take a fitted model or something else, giving flexibility without much of a hurdle.
I would also try to lean on existing methods from other packages like stan itself.

Edit:

I would also add:

[ ] Wrap the output from wwinferece as a structure with a class attribute, e.g., "wwinference_fit". (see here)
[ ] Add a print method for print.wwinference_fit.
[ ] Add a plot method for the class.
[ ] Add a summary method for the class
[ ] Add coef, vcov, and confint methods for the class.

kaitejohnson commented 3 months ago

@gvegayon I am currently writing a bunch of tests of the functions and thinking now would be a good time to do some of this since it will require reworking wwinference() and the post-processing a bit (so get_draws_df() and get_model_diagnostic_flags(). I'm going to start a new PR for this which will include tests for everything from fitting to post-processing. Might tag you if I have questions on how to do the function overloading/creation of the class attribute if that's ok!

CDCgov / ww-inference-model

Standardize and stabilize outputs of `wwinference()` wrapper function #49

Goal

Requirements