PIT tests with discrete posterior predictive?

athowes commented 4 weeks ago

In writing tests for the function posterior_predict_latent_lognormal I've generated quantiles of the data within the posterior predictive distribution:

quantiles <- purrr::map_vec(1:max_i, function(i) {
  ecdf <- ecdf(dplyr::filter(df, i == i)$y)
  q <- ecdf(prep$data$Y[i])
  return(q)
})

I'm interested as to whether these quantiles are as you'd expect them to be distributed. The challenge is that the data are discrete e.g.

> prep$data$Y
  [1]  3  4  5  5  5  9  5  3  6  6  5  6  6  6  8  5  7  6  4  6  3  6  5 10  4  9  3  5  7  4  6  5  7  6  5  6  3  4  5  8 10  4  4  6  7  3  3  4  8  6  2 10  8  3  6  6  3  4  5  5  5  3  4  9  5  2  5
 [68]  6  5  8  2  7  7  9  4  4 15 12  3  4  5  4  5  4  5  6  7  4  3  4  6  6  2  6  4  3  5  4  5  4  4 10  4  1  2 10  7  2 11 12  8  7  2  8  2  3  6  7  3  7  4  5  4  5  5  7  5  5  8  6  3  2  4  4
[135]  5  5  4  4  3  5  4  3  4  6  7  4  4  6  7  4  3  3  2  6  8  9  6  3  7  9  7  5  4  6  4  7  4  5  3  4  9  7  5  5  6  6  8  5  5  8  6 10  2  7  4  5  7  3  4  9  2  3  2  3  5  3  9  7  6  5  2
[202]  6  3  2  4  4 10  2  3  7  5  5  6  4  6  8  5  5  7 10  7  4  8  8  6  2  6  7  6  5  3  3  5  3  6  3  2  3  5  5  4  5  5  4  3  7  6  3  4 12  7  3  3  7  7  9  4  7  4  5  6  1  4  9  5 10  4  2
[269]  9  3  7  3  6  4  7  5  3  8  4  3  6 11  4  9 21  4  6  4  5  5  4  8  6  5  8  5  5  8  5  4  6  7  4  4  4 11  4  3  5  4  4  5  5  5  3  3  6  6  8  6  6  6  4  2  7  3  8  4  3  5  3  6  6  4 12
[336]  9 10  3  4 10  3  5  4  3  3  6 13  7  5  2  7  2  4  5  4  7  4  5  2  6  4  7  5  3  9  4  5  5  5  5  5  9  3  9  2  7  4  5  4  4  4  6  4  3  6  3 13  7  9  9  6  4  5  8  5  6  5  3  2  4  4  6
[403]  7  5  3  3  6  5 15  3  3  8  5  2  4  7  1  6  5  2  2  7  2  4  3 10  2  9  4 10  6  4  5  4  3  6  3  8  4  5  5  6  3  4  5 15  3  3  5 13  3  5  6  7  9  9  4  5  9  3 10  4  4  3  9  1  5  4  4
[470] 11  4  6  7  9  6  3  2  5  5  4  7  4  3  4  4  4  4  5  7  5  5  6  9  2  2  5 10  7  5  5

Hence, how can I test this? What distribution is expected? It's not $\mathcal{U}(0, 1)$ like in the continuous case.

Here are what the quantiles look like:

And a summary:

> summary(quantiles)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0066  0.3927  0.5720  0.5473  0.8205  0.9998

seabbs commented 4 weeks ago

Could look at how scoringutils handles PIT histograms in the discrete case I remember there being an adaption/approximation.

sbfnk commented 1 week ago

Some details on how to handle this are in

Predictive Model Assessment for Count Data Claudia Czado, Tilmann Gneiting, Leonhard Held Biometrics, Volume 65, Issue 4, December 2009, Pages 1254–1261 https://doi.org/10.1111/j.1541-0420.2009.01191.x

epinowcast / epidist

PIT tests with discrete posterior predictive? #247