arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.56k stars 387 forks source link

LOO PIT documentation/warnings and user experience in general #1702

Open OriolAbril opened 3 years ago

OriolAbril commented 3 years ago

loo_pit and plot_loo_pit have had several issues for a while, both in terms of code/design and in terms of documentation. This aims to be a bit of a brainstorming to first gather all problematic points related to those and suggest fixes. cc @avehtari @aloctavodia

Related to this last point, how about a table with ecdf diff plots and/or kde plots for:

bias\dispersion over + image under + image
positive + image image of overdispersion and positive bias image of underdispersion and positive bias
negative + image image of overdispersion and negative bias image of underdispersion and negative bias

Related to https://discourse.mc-stan.org/t/understanding-loo-pit-graphical-diagnostics/22633 but also to several other posts and questions around both Stan and PyMC discourses and github.

avehtari commented 3 years ago

The probability integral transformation for discrete data is discussed in Czado, Claudia, Tilmann Gneiting, and Leonhard Held (2009). “Predictive Model Assessment for Count Data”. In: Biometrics 65.4, pp. 1254–1261. Direct use of PIT gives biased results. That paper discuss randomized and deterministic approach for correcting the issue. For a large number of distinct states the bias is smaller. I would prefer non-smoothed versions when they are available, and e.g. that's why I favor ECDF-difference instead of the histogram (ECDF-difference is less well known and thus people are not used to interpret it, but histogram hides things). We're still working efficient computation of the envelopes for the discrete distribution ECDF. The binary data is the extreme discrete case and it would be better to use other plots. Note also that posterior and LOO predictive distributions don't match the true distribution but asymptotically. E.g. for normal(mu, sigma) model the posterior and LOO predictive distributions are t-distributions and only asymptotically t-distribution approaches normal. The normal model example is not that bad as the asymptotic regime is reached quickly, but in general it means that uniformity tests are not calibrated for LOO-PITs, which means that the graphical plot is indicative but we can't make exact statements about uniformity.

OriolAbril commented 2 years ago

Another interesting blogpost on using loo-pit: https://ferrine.github.io/posts/2022/Feb/01/interpreting-loo-pit/

avehtari commented 2 years ago

That reminded me that I should add LOO-PIT to CV-FAQ (but run out of time today, and try to do that tomorrow)