aleximmer / Laplace

Laplace approximations for Deep Learning.
https://aleximmer.github.io/Laplace
MIT License
436 stars 63 forks source link

Computation Hessian GGN #110

Closed nilsleh closed 1 year ago

nilsleh commented 1 year ago

Hi, I have a question regarding the computation of the GGN approximation for the Hessian in the regression case:

def _get_full_ggn(self, Js, f, y):
        """Compute full GGN from Jacobians.
        Parameters
        ----------
        Js : torch.Tensor
            Jacobians `(batch, parameters, outputs)`
        f : torch.Tensor
            functions `(batch, outputs)`
        y : torch.Tensor
            labels compatible with loss
        Returns
        -------
        loss : torch.Tensor
        H_ggn : torch.Tensor
            full GGN approximation `(parameters, parameters)`
        """
        loss = self.factor * self.lossfunc(f, y)
        if self.likelihood == 'regression':
            H_ggn = torch.einsum('mkp,mkq->pq', Js, Js)

When I compare that with equation 9 in your paper Bayesian Deep Learning via Subnetwork Inference I do not see the Hessian of the negative log-likelihood w.r.t. model outputs. Why does that term vanish in the regression case and is just the product of the Jacobians?

runame commented 1 year ago

Hi Nils,

the Hessian of the negative log-likelihood w.r.t. model outputs is implicitly there, since it is simply the identity matrix, see appendix A.2 of this paper.

nilsleh commented 1 year ago

Awesome, thank you I was not aware of that!