epinowcast / epidist

An R package for estimating epidemiological delay distributions
http://epidist.epinowcast.org/
Other
9 stars 4 forks source link

Posterior prediction for lognormal #214

Closed athowes closed 3 weeks ago

athowes commented 1 month ago

The code prep <- brms::prepare_predictions(fit) works.

To fit into the way brms handles things what we would need to do is write custom functions like:

log_lik_latent_lognormal <- function(i, prep) {
}
posterior_predict_latent_lognormal <- function(i, prep, ...) {
}
posterior_epred_latent_lognormal <- function(prep) {
}

With these three things defined then all the standard brms post-processing should work.

Downside: have to write these functions by family class.

Suggestion: write this strata prediction function still (in a way which is 100% class agnostic) and then also write these above functions for the most common families (could just start with lognormal).

I'd say that these functions are more likely to be used by expert users of the package, and then the delay_samples function would be more likely to be used for "off the shelf" analysis by most users.

Originally posted by @athowes in https://github.com/epinowcast/epidist/issues/188#issuecomment-2260072005

athowes commented 1 month ago

Seems like the log_lik part is harder here, so first PR for this can just solve the other two parts. Edit: split into #240.

athowes commented 1 month ago

Note that Paul writes:

Make sure to add a ... argument to your posterior_predict function even if you are not using it, since some families require additional arguments.

I don't get this? Don't I know if my function requires additional arguments?

athowes commented 1 month ago

In F2F meeting:

athowes commented 1 month ago

Within this issue we should:

  1. Remove predict_delay_parameters
  2. Replace any instances with the new functionality
athowes commented 1 month ago

We already have this function in the package:

simulate_double_censored_pmf <- function(
  alpha, beta, max, n = 1000,
  rprimary = \(x) (runif(x, 0, 1)), rdelay = rlnorm,
  delay_obs_process = \(s, p) (floor(s) - floor(p))
) {
  primary <- rprimary(n)
  secondary <- primary + rdelay(n, alpha, beta)
  delay <- delay_obs_process(secondary, primary)
  delay <- pmax(delay, 0)
  cdf <- ecdf(delay)(0:max)
  pmf <- c(cdf[1], diff(cdf))
  return(pmf)
}

It seems pretty simple to explain what is happening here. There are n primary event times simulated from some generic distribution, the secondary event times are the primary event times plus some delays again simulated from some distribution. Then there is a "delay observation process" as a function of the true primary and secondary event times that gets us to the observed delays. Then you make them non-negative, and compute the ECDF and EPMF.

The challenge to me would be translating from this quite general version where you offload specification of rprimary, rdelay and delay_obs_process to something specific which uses the data from the fitted model to get at these. Of course as it's a family specific function we know the delays are lognormal. We know the primary event times are uniform (for now). So the main part that is confusing to me is the delay_obs_process.

As a side note, presumably we should delete this function from the package.

seabbs commented 1 month ago

I don't get this? Don't I know if my function requires additional arguments?

This seems to refer to writing custom posterior_predict methods I think and is good practice so s3 methods.

seabbs commented 4 weeks ago

Note that simulate_double_censored is being depreciated from epinowcast in favour of: https://github.com/epinowcast/primarycensoreddist