Closed kaitejohnson closed 1 month ago
I like @dylanhmorris' suggestion. Like we are doing with the inference object, we could have additional S3 classes (e.g., wwinference_fit_draws
) with a print method (so it is made evident what the object has) and a plot method. That said, maybe the _df
suffix may not be appropriate anymore?
That makes sense to me, @gvegayon. I think we could have the plot method plot the 4 outputs we already created plotting functions for (hospital admissions, wastewater concentrations, subpopulation R(t) estimates, and global R(t) estimates).
And yes, agree that we would want to rename to something like extract_posterior_draws()
?
I was thinking it would be best to generate all of the output dataframes as a list by default, and and in the vignette include these variables so it is clear to the user that they don't need to get these all in return if they don't want them.
Hey @dylanhmorris and @kaitejohnson, I just updated the first comment of the issue with a draft solution. LMK what are your thoughts before I jump into coding!
Also, the Deprecated
function is more critical in projects with many active users. We are not there yet, but I think (a) it is a good practice in general, and (b) it is a good opportunity to learn how to use it. The "R packages" book has a nice note on it here.
Your spec looks good to me @gvegayon. I think a full plot
method might not be needed.
The function will have an optional argument called what that will receive the strings "all", and whatever other elements can be extracted via tidy_bayes::spread_draws().
To confirm, you're proposing to take in (A) the names of parameters for which to get draws, not (B) actual expressions that can be passed as the expression argument to tidybayes::spread_draws()
, yes?
@gvegayon This looks great to me. My question is the same as @dylanhmorris's but I think I have a strong opinion that it should expect the human readable language that we document somewhere e.g. "predicted hospital admissions", "subpopulation R(t)", etc rather than the stan latent variables.... @dylanhmorris what do you think? Just think this will be slightly cleaner from a user side experience.
I agree that the plot
method isn't top priority, but you could just show in the vignette how you would pass each df
to the corresponding plot function.
I was thinking to pass a human-readable expression, e.g., pass what="subpopulation R(t)"
instead of a tidybayes
expression. I can leave the plot method as an optional feat, but I don't think it will take too much effort to get it done. If that sounds good to you, I can start working on it
This sounds great to me @gvegayon thanks so much!
Goal
We currently use the
get_draws_df()
function to get out a tidy long form dataframe that contains all of the output types presumably of interest to the user: hospital admissions, site-lab level wastewater concentrations, site-level R(t) estimates, global R(t) estimates.This requires binding together a bunch of the outputs from
tidy_bayes::spread_draws()
joined to the relevant data. However, I think there are two issues that make this not ideal:Draft solution [2024-09-09]
get_draws_df()
will now be a list of classwwinference_fit_draws
(S3 class).get_draws_df()
will be replaced byget_draws()
.what
that will receive the strings"all"
, and whatever other elements can be extracted viatidy_bayes::spread_draws()
.get_draws_df()
will still work but will be an alias toget_draws()
using the.Deprecated()
function call; this will throw a warning message to the users indicating the functionget_draws_df()
will be removed in the next version of the model.plot()
andprint()
, with theplot()
method having awhat=
argument that will specify which bits of the list to plot using the existing plotting functions.wwinference_fit_draws
will also have a methodas.data.frame()
to turn the list into a data frame.what="all"
.Proposed solution
@dylanhmorris suggested, and I think I now agree, that it would probably make more sense to split all these tidy draws dataframes up into a list of dataframes as the default output. We could provide an argument to the function that allows the user to specify which output type they want if they don't want to pull the extra dataframes into memory. Downstream functions (currently this is just plotting functions) would work similarly as now but would need to call the correct dataframe in the list.
Another alternative would be to keep the long dataframe format, but again have the
get_draws_df()
argument allow the user to specify the output types they want returned.Curious to hear others thoughts on this @dylanhmorris @gvegayon @seabbs @akeyel