arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.56k stars 387 forks source link

Additional LOO functionality #2059

Open sethaxen opened 2 years ago

sethaxen commented 2 years ago

There are several useful functions that can be used for model assessment and comparison that could be added here. In some cases these are already available in loo or soon will be:

OriolAbril commented 2 years ago

Generally agree. Not so sure about the logo one. It would basically be

  1. groupby using xarray
  2. call az.loo

I personally think it would be better to document this and have users do the groupby themselves instead of adding a logo function. If we add a function, I expect users to assume only cv schemes directly compatible with loo or logo are supported, but they can also do any combination themselves and I don't think we can support all possible cases, a draft example of already using logo with loo would be https://nbviewer.org/github/OriolAbril/Exploratory-Analysis-of-Bayesian-Models/blob/multi_obs_ic/content/Section_04/Multiple_likelihoods.ipynb#Predicting-team-level-performance.

sethaxen commented 2 years ago

Good point about logo. I agree, if we can document it, and the code only takes a few lines, then it doesn't make sense to add a logo implementation for this. Thanks for the example; should be straightforward to adapt.

aadya940 commented 7 months ago

Hi @sethaxen , I was willing to implement the loo_expectation function in arviz.stats. From what I understood, we could implement it by calculating the weights using az.loo and then use the weights to calculate the weighted expectation, something like (values * weights).sum()/weights.sum(). I just wanted to know if I'm on the right track?

sethaxen commented 7 months ago

That would be one approach. loo both computes the log-weights using psislw and then estimates the ELPD. Technically this function only needs the log-weights from loo, so it would be more efficient to just call psislw as loo does.

something like (values * weights).sum()/weights.sum().

Yep, this is the right approach, with a few notes:

Namely, in addition to using psis(-log_likelihood) to get the LOO log-weights, you would also compute something like

summand = np.sign(values) * np.exp(np.log(np.abs(values)) - log_likelihood))
log_abs_summand = np.abs(values) - log_likelihood
max_log_abs_summand = log_abs_summand.max()
summand_scaled = np.sign(values) * np.exp(max_log_abs_summand - max_log_abs_summand)
tail_right = ...  # select right tail of summand_scaled using approach of https://github.com/arviz-devs/arviz/blob/6b1b2be60cca804d4b0ec43a3303820bfa8785c6/arviz/stats/stats.py#L974-L980
tail_left = ...  # do the same but for `-summand_scaled`
k_hat_right, _ = _gpdfit(tail_right)
k_hat_left, _ = _gpdfit(tail_left) 
k_hat = max(k_hat_right, k_hat_left)

It might be better to refactor _psislw slightly to reduce code reuse. k_hat should at least be used to warn if an expectation will be poorly estimated, but a user may optionally want it returned. Some reworking is probably needed if you want to compute expectations with respect to multiple functions simultaneously. (IIRC, this is effectively what loo_pit does and needs)

@avehtari does this look right?

FWIW, I would support adding loo_expectation in one PR and then adding the diagnostics in a later PR.

avehtari commented 7 months ago

Corresponding R functions are in https://github.com/stan-dev/loo/blob/master/R/E_loo.R, see specifically .wmean, .wvar, .wsd, .wquant. loo package is storing log weights, but in these computations the normalized weights are computed and used in .w* functions.

For the diagnostics, loo package has now also minimum sample size and sample size dependent khat-threshold. The new diagnostic summary messages are in a PR https://github.com/n-kall/posterior/blob/9a59fb57fbd7947a06eab908b3b2209fc60a2146/R/pareto_smooth.R#L611, and will be merged before the next release.

aadya940 commented 7 months ago

@sethaxen Thanks for the quick reply :+1: . I'll keep these points in mind and try to submit a draft PR implementing the loo_expectation function, you can review it whenever you have time.