LOO PIT documentation/warnings and user experience in general

OriolAbril commented 3 years ago

loo_pit and plot_loo_pit have had several issues for a while, both in terms of code/design and in terms of documentation. This aims to be a bit of a brainstorming to first gather all problematic points related to those and suggest fixes. cc @avehtari @aloctavodia

plot_loo_pit doesn't take the output from loo_pit, every plot must recompute loo pit currently. This should be changed, either following something like plot_hdi or plot_elpd. I think it's good that plot_loo_pit works on inferencedata directly, but it should definitely work on precomputed loo pit values.
loo_pit returns an array or dataarray with loo pit values only. All loo diagnostics are lost in the process. Not sure how should this be addressed.
smoothing of discrete data. Originally loo_pit did not work on discrete data (even less so on binary data) until we added the smoothing which seems to work but has no note, explanation or reference to back it up. Which references could be added? Does it deserve a warning? If so which one? Are there known cases where it's expected to work and cases where it's expected to not work (i.e. in the poisson->gaussian limit I would expect it to work even without smoothing)?
The docstrings of the functions don't link to one another. And only plot_loo_pit has references.
The references in general are quite limited and users don't seem to be able to interpret the plots after reading them. Eduardo's blogpost and my blogpost seem to add a bit more clarity when it comes to interpretation. Should we link to those? Write a max mix of those into EABM and link there? These two posts still seem to provide limited understanding, how could they be extended/improved?

Related to this last point, how about a table with ecdf diff plots and/or kde plots for:

bias\dispersion	over + image	under + image
positive + image	image of overdispersion and positive bias	image of underdispersion and positive bias
negative + image	image of overdispersion and negative bias	image of underdispersion and negative bias

Related to https://discourse.mc-stan.org/t/understanding-loo-pit-graphical-diagnostics/22633 but also to several other posts and questions around both Stan and PyMC discourses and github.

avehtari commented 3 years ago

The probability integral transformation for discrete data is discussed in Czado, Claudia, Tilmann Gneiting, and Leonhard Held (2009). “Predictive Model Assessment for Count Data”. In: Biometrics 65.4, pp. 1254–1261. Direct use of PIT gives biased results. That paper discuss randomized and deterministic approach for correcting the issue. For a large number of distinct states the bias is smaller. I would prefer non-smoothed versions when they are available, and e.g. that's why I favor ECDF-difference instead of the histogram (ECDF-difference is less well known and thus people are not used to interpret it, but histogram hides things). We're still working efficient computation of the envelopes for the discrete distribution ECDF. The binary data is the extreme discrete case and it would be better to use other plots. Note also that posterior and LOO predictive distributions don't match the true distribution but asymptotically. E.g. for normal(mu, sigma) model the posterior and LOO predictive distributions are t-distributions and only asymptotically t-distribution approaches normal. The normal model example is not that bad as the asymptotic regime is reached quickly, but in general it means that uniformity tests are not calibrated for LOO-PITs, which means that the graphical plot is indicative but we can't make exact statements about uniformity.

OriolAbril commented 2 years ago

Another interesting blogpost on using loo-pit: https://ferrine.github.io/posts/2022/Feb/01/interpreting-loo-pit/

avehtari commented 2 years ago

That reminded me that I should add LOO-PIT to CV-FAQ (but run out of time today, and try to do that tomorrow)

arviz-devs / arviz

LOO PIT documentation/warnings and user experience in general #1702