The role of the prior - Githubissues

aleximmer / Laplace

Laplace approximations for Deep Learning.

MIT License

436 stars 63 forks source link

Hi! Thank you for your work with the project. I have a question that does not exactly concern this repository, rather the theory behind it. I am confused when it comes to the role of the prior in Laplace approximation.

I thought, that the goal is to estimate the posterior $p(w|\mathcal{D})$:

$$ p(w|\mathcal{D}) = \frac{ p(\mathcal{D}|w) p(w) }{ p(\mathcal{D}) } = \frac{1}{Z} f(w) \approx \mathcal{N}(w|m, S) $$

,then the Taylor expansion and so on, so that finally we can do the predictions for example by sampling the weights distribution $p(w|\mathcal{D})$.

This is the idea I have right now, however I don't understand where the post hoc prior precision tuning fits here. Sorry if the question is trivial, I am trying to clear the view on the theory behind Laplace

Hi Artur, that's indeed a good question. Theoretically, there is no clear justification to tun the prior precision post hoc but it has been observed in several papers that this improves performance or is even required to improve performance. A possible justification for it is that we do not end up at a true MAP when optimizing the neural network with SGD for a limited amount of epochs. However, it could also be seen as a type of temperature scaling because changing the prior post hoc can artificially concentrate or widen the posterior predictive.

We discuss this in appendix B 3 in the corresponding paper, where we also discuss the online empirical Bayes method to optimize the prior precision. This requires no post hoc adjustments for the posterior predictive but also does not seem to profit much from a posterior predictive over the MAP as we discuss in Appendix C.

aleximmer / Laplace

The role of the prior #108