TuringLang / ParetoSmooth.jl

An implementation of PSIS algorithms in Julia.
http://turinglang.org/ParetoSmooth.jl/
MIT License
19 stars 12 forks source link

Bayesian Bootstrap #20

Open ParadaCarleton opened 3 years ago

ParadaCarleton commented 3 years ago

The goal is to add the Bayesian bootstrap as a first-class alternative to leave-one-out cross validation in this package. I believe the Bayesian bootstrap should provide the major advantage of letting users plot a full posterior distribution, rather than just having a point estimate and standard error. Aside from the actual informational difference, I've noticed that plotting posteriors is a good way to help people intuitively understand that the point estimates aren't special. Show someone a point estimate and a standard error and they will usually either ignore the standard error or construct a 95% normal confidence interval.

Mentioning @topipa because Aki suggested I talk to you about this, and said you've built something similar before. I know that the bootstrap tends to underestimate the bias caused by overfitting, because bootstrap resamples will be more similar to the data than a new sample from the original distribution would be -- did your own implementation use any corrections for this bias?

My own thoughts on how to correct this:

  1. Iterated bootstrap techniques, and
  2. Adding random noise to resamples -- instead of assigning a random Dirichlet-distributed probability to every observation x, we can draw random observations x + N, where N ~ Normal(0, Σ / n), and then assign a random Dirichlet probability to each of these resamples. I've seen Tim Hesterberg suggest this in his textbook, but Aki seemed to suggest it would be a bad idea. Intuitively I'd expect this to reduce the bias caused by resampling, since we'd at least be getting the variance of the underlying distribution correct, but I could be wrong.

I have an initial implementation of a basic BB here, although it's not quite working yet -- the estimates seem to be slightly off, but I'm not sure why.