aleximmer / Laplace

Laplace approximations for Deep Learning.
https://aleximmer.github.io/Laplace
MIT License
436 stars 63 forks source link

The role of the prior #108

Open ArturPrzybysz opened 1 year ago

ArturPrzybysz commented 1 year ago

Hi! Thank you for your work with the project. I have a question that does not exactly concern this repository, rather the theory behind it. I am confused when it comes to the role of the prior in Laplace approximation.

I thought, that the goal is to estimate the posterior $p(w|\mathcal{D})$:

$$ p(w|\mathcal{D}) = \frac{ p(\mathcal{D}|w) p(w) }{ p(\mathcal{D}) } = \frac{1}{Z} f(w) \approx \mathcal{N}(w|m, S) $$

,then the Taylor expansion and so on, so that finally we can do the predictions for example by sampling the weights distribution $p(w|\mathcal{D})$.

This is the idea I have right now, however I don't understand where the post hoc prior precision tuning fits here. Sorry if the question is trivial, I am trying to clear the view on the theory behind Laplace

aleximmer commented 1 year ago

Hi Artur, that's indeed a good question. Theoretically, there is no clear justification to tun the prior precision post hoc but it has been observed in several papers that this improves performance or is even required to improve performance. A possible justification for it is that we do not end up at a true MAP when optimizing the neural network with SGD for a limited amount of epochs. However, it could also be seen as a type of temperature scaling because changing the prior post hoc can artificially concentrate or widen the posterior predictive.

We discuss this in appendix B 3 in the corresponding paper, where we also discuss the online empirical Bayes method to optimize the prior precision. This requires no post hoc adjustments for the posterior predictive but also does not seem to profit much from a posterior predictive over the MAP as we discuss in Appendix C.