epinowcast / epidist

Estimate epidemiological delay distributions with brms
http://epidist.epinowcast.org/
Other
12 stars 5 forks source link

Help user pass `newdata` as sensible things (e.g. all strata) #213

Closed athowes closed 3 months ago

athowes commented 3 months ago

In #210 we added functionality to produce predictions (of the delay internal and natural scale parameters) via brms::prepare_prediction for any family.

There is an argument newdata as follows:

An optional data.frame for which to evaluate predictions.
If NULL (default), the original data of the model is used.
NA values within factors are interpreted as if all dummy variables of this factor are zero.
This allows, for instance, to make predictions of the grand mean when using sum coding.

Following @seabbs who IMO correctly summarises where we should go:

We can also provide some functionality to either help users extract unique data points from there data (the simple version of all strata) and potentially to grid expand this to all combinations (i.e observed/unobserved) but I think the first pass should be... (what we have already done)

Basically, here we need to now help users to specify common newdata options.

Options as far as I see it are either:

  1. Helper functions for them to do that that they call outside predict_delay_samples
  2. Options in predict_delay_samples then put the helper functions inside predict_delay_samples

I probably favour 2. over 1. but could be convinced / not strong.

seabbs commented 3 months ago

In terms of keeping things atomic my preference is helper functions outside of the prediction function

athowes commented 3 months ago
athowes commented 3 months ago

I do think we need to have newdata enforced as_latent_individual.

The question for me is say that we provide functionality to generate predictions for all strata, then what values to set for the other columns of newdata?

For example, say we have a model like 1 + sex on mu and sigma. Then to make predictions we still need a newdata with columns:

This is a little bit confusing to me. Is there some version of these predictions which is agnostic / integrates out / ... these other variables? Say I want to know about the expected delay distribution for a particular sex. Is there a version of that which isn't a function of the observation time?

athowes commented 3 months ago

This is a useful blog post: https://www.andrewheiss.com/blog/2021/11/10/ame-bayes-re-guide/#posterior-predictions

seabbs commented 3 months ago

Nice that is useful. It looks like if we can plug into emmeans we can get most of the functionality a user might want much more simply.

athowes commented 3 months ago
  1. Predict for all 500 individuals, check if predictions are the same
  2. Put all non covariates to NA in newdata and run
  3. Vary non covariates in newdata and check no change in output
  4. expand.grid on all covariates... how to extract covariates. Go into brms model and extract things in the formula
  5. NA out the covariates as well. Overall prediction?
seabbs commented 3 months ago

If we do this with emmeans I am not sure we need to supply any helpers like this because it doesn't much of this for us

athowes commented 3 months ago

I've almost finished writing a first helper function for the new strata. I might suggest we complete adding this function, then create a new issue for interacting with emmeans. We can compare outputs from any potential emmeans implementation with this helper function.

athowes commented 3 months ago

Edit: the emmeans function is pretty good:

> emmeans::emmeans(fit_sex, specs = "sex")
 sex emmean lower.HPD upper.HPD
   0   2.02      1.95      2.08
   1   1.30      1.19      1.43

Limitations:

Hence:

Other:

seabbs commented 3 months ago

I think for a first pass we can get a lot of functionality from emmeans and we should do so (i.e just point out to it in the FAQ). I agree its not that bayesian but I am surprised you can't get samples out.

Once we have that in place (which is quite good coverage). I think we should think again about these strata functions (or if you have some in place we can do that sooner rather than later).

athowes commented 3 months ago

Closed as not going to do (unless it's hard to get things working with other packages).