Open smartArancina opened 1 week ago
Hi, these are great questions. I will try to clarify:
FunctionalLaplace
and FullLaplace
are equivalent, see for example this test https://github.com/aleximmer/Laplace/blob/main/tests/test_functional_laplace.py#L33. So they should not lead to different results unless you start making approximations either to the parametric or functional posterior approximations, such as subset-of-data.I hope this answers your questions, otherwise feel free to follow up.
Hi @aleximmer , thanks a lot for the ansewers these helped a lot. Yes, on question 2 I used a SoD on a my example obtaining the difference that confused me.
So, happy to see that on the entire dataset FunctionalLaplace
and FullLaplace
should match. But now I have others questions ! :)
FunctionalLaplace
over FullLaplace
in regression setting? Thanks again.
Premise
sorry if some question ( or all ! :) ) could seems stupid / basic / totally wrong, I am in the process of self studying GP + improves my bayes knowledge + LA approx with NN and I am falling in love with this topics, but for sure I have to better understand them !
Description
In particular I am referring to the original paper: Improving predictions of Bayesian neural nets via local linearization
Before I show here the reasoning flow I understood
Weight space
From section 3.1 - 3.2 of the paper I understood that the use of GNN to approximate the weigths posterior is justified when we use the linerized model in the likelihood and this leads to a GLM
where the found approximate posterior is:
then the approximate posterior distribution
Function space (GP)
we define the fully gp posterior inference using GNN + GLM as follow, starting from prior in function space
Questions
1) Why we use the prior in the GP formulation instead of the approximated posterior found with GNN ? 2) which is the difference between
FunctionalLaplace
andFullLaplace
(maths formula a part) ? That is: why doing the posterior inference in the function space directly starting from the functional prior leads to different results ? I thought that the NN linearization + gaussian likelihood resulted in the GP formulation as shown in the image below (using linearity property of gaussians and the fact that they are closed under conditioning and marginalization + assuming a zero mean and diagonal prior on the weights). TheFullLaplace
implementation should follow the following approch if I understood correctly3) In the paper you showed the using of approximating the weights posterior predictive distribution by sampling becouse we are assuming a general likelihood right ? Otherwise in case of a Gaussian likelihood we should have the following closed formula right (the one showed in question 2)) ?