Error when trying to compute influences for different subset of training data

ntheol commented 8 months ago

Hello!

I am trying to compute influences for different subset of training data and sometimes I get an error that I cant explain. The error I am talking about is this one:

_LinAlgError                              Traceback (most recent call last)
[C:\Users\NIKOLA](file:///C:/Users/NIKOLA)~1\AppData\Local\Temp/ipykernel_12584/239428499.py in <module>
----> 1 train_influences = compute_influences(
      2     TorchTwiceDifferentiable(nn_model, F.binary_cross_entropy),
      3     training_data=training_data_loader,
      4     test_data=val_data_loader,
      5     influence_type="up",

[c:\Users\Nikolas](file:///C:/Users/Nikolas) Theol\AppData\Local\Programs\Python\Python38\lib\site-packages\pydvl\influence\general.py in compute_influences(differentiable_model, training_data, test_data, input_data, inversion_method, influence_type, hessian_regularization, progress, **kwargs)
    307         test_data = deepcopy(training_data)
    308 
--> 309     influence_factors, _ = compute_influence_factors(
    310         differentiable_model,
    311         training_data,

[c:\Users\Nikolas](file:///C:/Users/Nikolas) Theol\AppData\Local\Programs\Python\Python38\lib\site-packages\pydvl\influence\general.py in compute_influence_factors(model, training_data, test_data, inversion_method, hessian_perturbation, progress, **kwargs)
    117         rhs = cat(list(test_grads()))
    118 
--> 119     return solve_hvp(
    120         inversion_method,
    121         model,

[c:\Users\Nikolas](file:///C:/Users/Nikolas) Theol\AppData\Local\Programs\Python\Python38\lib\site-packages\pydvl\influence\inversion.py in solve_hvp(inversion_method, model, training_data, b, hessian_perturbation, **kwargs)
     65     """
     66 
---> 67     return InversionRegistry.call(
     68         inversion_method,
     69         model,

[c:\Users\Nikolas](file:///C:/Users/Nikolas) Theol\AppData\Local\Programs\Python\Python38\lib\site-packages\pydvl\influence\inversion.py in call(cls, inversion_method, model, training_data, b, hessian_perturbation, **kwargs)
    201         """
    202 
--> 203         return cls.get(type(model), inversion_method)(
    204             model, training_data, b, hessian_perturbation, **kwargs
    205         )

[c:\Users\Nikolas](file:///C:/Users/Nikolas) Theol\AppData\Local\Programs\Python\Python38\lib\site-packages\pydvl\influence\inversion.py in wrapper(*args, **kwargs)
    154             @functools.wraps(func)
    155             def wrapper(*args, **kwargs):
--> 156                 return func(*args, **kwargs)
    157 
    158             cls.registry[key] = wrapper

[c:\Users\Nikolas](file:///C:/Users/Nikolas) Theol\AppData\Local\Programs\Python\Python38\lib\site-packages\pydvl\influence\torch\torch_differentiable.py in solve_linear(model, training_data, b, hessian_perturbation)
    528     )
    529     info = {"hessian": hessian}
--> 530     return InverseHvpResult(x=torch.linalg.solve(matrix, b.T).T, info=info)
    531 
    532 

_LinAlgError: torch.linalg.solve: The solver failed because the input matrix is singular.

It's a LinAlgError that says the input matrix is singular, meaning that the det(matrix)=0 and the matrix cannot be inverted. I found from the 'torch_differentiable.py' file how the matrix is computed and it's like this if I'm right : hessian = model.hessian(torch.cat(all_x), torch.cat(all_y)) matrix = hessian + hessian_perturbation * torch.eye( model.num_params, device=model.device )

As an experimentation step I copy pasted these lines of code in order to compute the matrix and see if the det is actual zero and I found that even If the code runs and doesn't produce this error the det is again 0. Also in your example notebooks if compute the matrix, again the det is zero but the "compute_influences" doesnt produce this error which seems a bit odd.

Can you give me any information why is this happening or what am I doing wrong maybe and can change it?

Thank you!

schroedk commented 8 months ago

@ntheol it is typically observed, that the hessians of deep neural nets become numerically singular at some point. The parameter hessian_perturbation is there to prevent from directly solving a singular system (as it shifts the lowest eigenvalue away from zero). The default value is 0.0, so please try to add a small perturbation. Please, let me know, if this explanation helps you.

ntheol commented 8 months ago

@schroedk Thank you, I think it's okay now!

ntheol commented 8 months ago

@schroedk Just by curiosity, what values do you consider good for hessian perturbation? Should it be something like 0.5 or smaller?

mdbenito commented 8 months ago

The perturbation is there to nudge small eigenvalues of the Hessian away from zero. Without knowledge of the spectrum, or without some kind of bound on the lowest eigenvalue, it is not possible to determine a good value. One thing one can try is to perform the inversion with some perturbation, and if it fails, try again with it doubled, and so on, until it doesn't fail any more. However, it can be preferable to avoid the direct inversion method and use the Arnoldi iteration instead, since it approximates the dominating eigenvalues, avoiding the lowest ones (although one has to select a rank estimate properly... to avoid the vanishing eigenvalues)

mdbenito commented 8 months ago

@ntheol I'll close the issue for now. Feel free to start a discussion on this topic if you have more questions.

aai-institute / pyDVL

Error when trying to compute influences for different subset of training data #447