Function for pointwise log likelihood

athowes commented 1 month ago

For compatibility of our models with model comparison tools like loo we need to write functions for the pointwise log likelihood.

For example, for the latent lognormal , this would look like the following:

log_lik_latent_lognormal <- function(i, prep) {
  mu <- brms::get_dpar(prep, "mu", i = i)
  sigma <- brms::get_dpar(prep, "sigma", i = i)
  y <- prep$data$Y[i]
  vreal1 <- prep$data$vreal1[i]
  vreal2 <- prep$data$vreal2[i]
  vreal3 <- prep$data$vreal3[i]

  # Need to be integrating these out?
  # swindow_raw <- ...
  # pwindow_raw <- ...

  # pwindow <- ...
  # swindow <- ... 
  # obs_t <- ...
  # latent_lognormal_lpdf(y, mu, sigma, pwindow, swindow, obs_t)
}

Here I have used latent_lognormal_lpdf as if it were exported from the Stan code, but it's not possible to do that.

I think that the challenging part of doing this is that the latent variables like swindow_raw and pwindow_raw need to be integrated out. I could probably write some way to do this using quadrature.

Is this a problem you've already encountered in the academic side of this work @seabbs @kgostic @parksw3? Any pointers?

How does this fit into the bigger picture

I think model comparison is an important feature to facilitate, and by writing this one function we get all of the loo functionality (I think)
I think we should demonstrate model comparison in #52
This issue has been split off from #214 because it's the harder part and not required for prediction

athowes commented 1 month ago

This code I wrote in PR #220 might be useful:

#' The latent lognormal LPDF function
#'
#' To be tested against an exposed Stan version. I think that this would be
#' used in the forthcoming `log_lik_latent_lognormal` function.
#'
#' @param y ...
#' @param mu ...
#' @param sigma ...
#' @param pwindow ...
#' @param swindow ...
#' @param obs_t ...
#' @family postprocess
#' @autoglobal
#' @export
latent_lognormal_lpdf <- function(y, mu, sigma, pwindow, swindow, obs_t) {
  d <- y - pwindow + swindow
  obs_time <- obs_t - pwindow
  lpdf <- dlnorm(d, meanlog = mu, sdlog = sigma, log = TRUE)
  lcdf <- plnorm(obs_time, meanlog = mu, sdlog = sigma, log.p = TRUE)
  return(sum(lpdf - lcdf))
}

athowes commented 2 weeks ago

Based on F2F with @kgostic and @parksw3:

May just be able to do this with the samples from the latent parameters rather than trying to integrate them out
Have a look around for what Stan people are doing / as to whether there is any separate integration

athowes commented 2 weeks ago

This function relies on the prep object from brms.

The prep object has the following elements:

> names(prep)
[1] "ndraws" "nobs"   "resp"   "family" "dpars"  "nlpars" "refcat" "ac"     "data"

I am looking for the latent variables needed to compute the log likelihood. I think they should be in prep$data but alas:

> names(prep$data)
[1] "Y"        "vreal1"   "vreal2"   "vreal3"   ""         ""         "wN"       "noverlap" "woverlap" ""         ""         ""

I wonder if there is a way to get them into prep? Put in generated quantities?
How come we get wN noverlap and woverlap out? These are just deterministic quantities I guess?
I think this might have been why I was thinking we needed to integrate them out, since there isn't an easy way to get at the sampled values (?)

Here is what is going on inside brms::prepare_predictions:

> brms:::prepare_predictions.brmsfit
function (x, newdata = NULL, re_formula = NULL, allow_new_levels = FALSE, 
    sample_new_levels = "uncertainty", incl_autocor = TRUE, oos = NULL, 
    resp = NULL, ndraws = NULL, draw_ids = NULL, nsamples = NULL, 
    subset = NULL, nug = NULL, smooths_only = FALSE, offset = TRUE, 
    newdata2 = NULL, new_objects = NULL, point_estimate = NULL, 
    ndraws_point_estimate = 1, ...) 
{
    x <- restructure(x)
    options(.brmsfit_version = x$version$brms)
    on.exit(options(.brmsfit_version = NULL))
    snl_options <- c("uncertainty", "gaussian", "old_levels")
    sample_new_levels <- match.arg(sample_new_levels, snl_options)
    ndraws <- use_alias(ndraws, nsamples)
    draw_ids <- use_alias(draw_ids, subset)
    warn_brmsfit_multiple(x, newdata = newdata)
    newdata2 <- use_alias(newdata2, new_objects)
    x <- exclude_terms(x, incl_autocor = incl_autocor, offset = offset, 
        smooths_only = smooths_only)
    resp <- validate_resp(resp, x)
    draw_ids <- validate_draw_ids(x, draw_ids, ndraws)
    draws <- as_draws_matrix(x)
    draws <- suppressMessages(subset_draws(draws, draw = draw_ids))
    draws <- point_draws(draws, point_estimate, ndraws_point_estimate)
    sdata <- standata(x, newdata = newdata, re_formula = re_formula, 
        newdata2 = newdata2, resp = resp, allow_new_levels = allow_new_levels, 
        internal = TRUE, ...)
    new_formula <- update_re_terms(x$formula, re_formula)
    bframe <- brmsframe(new_formula, data = x$data)
    prep_re <- prepare_predictions_re_global(bframe = bframe, 
        draws = draws, sdata = sdata, resp = resp, old_reframe = x$ranef, 
        sample_new_levels = sample_new_levels, )
    prepare_predictions(bframe, draws = draws, sdata = sdata, 
        prep_re = prep_re, resp = resp, sample_new_levels = sample_new_levels, 
        nug = nug, new = !is.null(newdata), oos = oos, stanvars = x$stanvars)
}

Which I think is then feeding into:

> brms:::prepare_predictions.bframel
function (x, draws, sdata, ...) 
{
    ndraws <- nrow(draws)
    nobs <- sdata[[paste0("N", usc(x$resp))]]
    out <- nlist(family = x$family, ndraws, nobs)
    class(out) <- "bprepl"
    out$fe <- prepare_predictions_fe(x, draws, sdata, ...)
    out$sp <- prepare_predictions_sp(x, draws, sdata, ...)
    out$cs <- prepare_predictions_cs(x, draws, sdata, ...)
    out$sm <- prepare_predictions_sm(x, draws, sdata, ...)
    out$gp <- prepare_predictions_gp(x, draws, sdata, ...)
    out$re <- prepare_predictions_re(x, sdata, ...)
    out$ac <- prepare_predictions_ac(x, draws, sdata, nat_cov = FALSE, 
        ...)
    out$offset <- prepare_predictions_offset(x, sdata, ...)
    out
}

So perhaps it looks like there are lots of these special parts prepare_predictions_xy which get out parts of the draws that particular brms models need. Maybe we are in the bad world where we'd need something like this. I don't know if there would be a way to tag the parameters we need as one of these things.

Again useful to have the Stan code to refer back to here:

> stancode
// generated with brms 2.21.6
functions {
  // code chunks used from epidist 0.0.0.9000

  // Here the strings
// * lognormal
// * vector mu, vector sigma
// * mu, sigma
// are/have been replaced using regex

real latent_lognormal_lpdf(vector y, vector mu, vector sigma, vector pwindow,
                           vector swindow, array[] real obs_t) {
  int n = num_elements(y);
  vector[n] d = y - pwindow + swindow;
  vector[n] obs_time = to_vector(obs_t) - pwindow;
  return lognormal_lpdf(d | mu, sigma) - lognormal_lcdf(obs_time | mu, sigma);
}
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  // data for custom real vectors
  array[N] real vreal1;
  // data for custom real vectors
  array[N] real vreal2;
  // data for custom real vectors
  array[N] real vreal3;
  int prior_only;  // should the likelihood be ignored?
  int wN;
  array[N - wN] int noverlap;
  array[wN] int woverlap;
}
transformed data {
}
parameters {
  real Intercept;  // temporary intercept for centered predictors
  real Intercept_sigma;  // temporary intercept for centered predictors
  vector<lower = 0, upper = 1>[N] swindow_raw;
vector<lower = 0, upper = 1>[N] pwindow_raw;
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  vector<lower = 0>[N] pwindow;
vector<lower = 0>[N] swindow;
swindow = to_vector(vreal3) .* swindow_raw;
pwindow[noverlap] = to_vector(vreal2[noverlap]) .* pwindow_raw[noverlap];
if (wN) {
  pwindow[woverlap] = swindow[woverlap] .* pwindow_raw[woverlap];
}
  lprior += normal_lpdf(Intercept | 2, 0.5);
  lprior += normal_lpdf(Intercept_sigma | 0, 0.5);
}
model {
  swindow_raw ~ uniform(0, 1);
pwindow_raw ~ uniform(0, 1);
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] sigma = rep_vector(0.0, N);
    mu += Intercept;
    sigma += Intercept_sigma;
    sigma = exp(sigma);
    target += latent_lognormal_lpdf(Y | mu, sigma, pwindow, swindow, vreal1);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;
  // actual population-level intercept
  real b_sigma_Intercept = Intercept_sigma;
}

athowes commented 2 weeks ago

As to what should be happening:

Data $y = (y_1, \ldots, y_n)$
Parameters $\theta = (\theta_1, \ldots, \theta_p)$
Likelihood $p(y | \theta)$
Prior $p(\theta)$

Usually there would be some parameters in the $\theta$ (or transformations of them) which would be sufficient for the likelihood. Things like the "distributional parameters" which are extracted out with brms::get_dpar.

My guess is that for the "pointwise" log-likelihood we do want to be using all of the draws of $\theta$ and not integrating anything out.

The alternative would be something like. Let $\theta = (\phi, \lambda)$. Where $\phi$ are "latent" variables. Then the marginal likelihood is:

$$ p(y | \lambda) = \int p(y | \phi, \lambda) p(\phi | \lambda) \text{d}\phi $$

athowes commented 2 weeks ago

I've managed to get something working here by:

Generating the latent variables randomly rather than extracting them
Ignoring the overlap bit
should be easy to fix. 1. might be hard.

athowes commented 2 weeks ago

Added a question on the Stan forum about this: https://discourse.mc-stan.org/t/log-likelihood-for-brms-custom-family-that-depends-on-latent-variables/36415

epinowcast / epidist

Function for pointwise log likelihood #240

How does this fit into the bigger picture