I am trying to compute influences for different subset of training data and sometimes I get an error that I cant explain. The error I am talking about is this one:

It's a LinAlgError that says the input matrix is singular, meaning that the det(matrix)=0 and the matrix cannot be inverted. I found from the 'torch_differentiable.py' file how the matrix is computed and it's like this if I'm right : hessian = model.hessian(torch.cat(all_x), torch.cat(all_y)) matrix = hessian + hessian_perturbation * torch.eye( model.num_params, device=model.device )

As an experimentation step I copy pasted these lines of code in order to compute the matrix and see if the det is actual zero and I found that even If the code runs and doesn't produce this error the det is again 0. Also in your example notebooks if compute the matrix, again the det is zero but the "compute_influences" doesnt produce this error which seems a bit odd.

Can you give me any information why is this happening or what am I doing wrong maybe and can change it?

Thank you!

@ntheol it is typically observed, that the hessians of deep neural nets become numerically singular at some point. The parameter hessian_perturbation is there to prevent from directly solving a singular system (as it shifts the lowest eigenvalue away from zero). The default value is 0.0, so please try to add a small perturbation. Please, let me know, if this explanation helps you.

@schroedk Thank you, I think it's okay now!

@schroedk Just by curiosity, what values do you consider good for hessian perturbation? Should it be something like 0.5 or smaller?

The perturbation is there to nudge small eigenvalues of the Hessian away from zero. Without knowledge of the spectrum, or without some kind of bound on the lowest eigenvalue, it is not possible to determine a good value. One thing one can try is to perform the inversion with some perturbation, and if it fails, try again with it doubled, and so on, until it doesn't fail any more. However, it can be preferable to avoid the direct inversion method and use the Arnoldi iteration instead, since it approximates the dominating eigenvalues, avoiding the lowest ones (although one has to select a rank estimate properly... to avoid the vanishing eigenvalues)

@ntheol I'll close the issue for now. Feel free to start a discussion on this topic if you have more questions.