amirgholami / PyHessian

PyHessian is a Pytorch library for second-order based analysis and training of Neural Networks
MIT License
675 stars 116 forks source link

Unexpected `shape` issue in Hessian-Vector computation #8

Open stalhabukhari opened 3 years ago

stalhabukhari commented 3 years ago

Hi!

Thank you making the source code of your work available. I tried to use the library for an application involving a 3D network architecture, and ran into the following issue:

********** Commencing Hessian Computation **********
Traceback (most recent call last):
  File "hessian_analysis.py", line 181, in <module>
    hessianObj.analyze(model_checkpoint_filepath)
  File "/media/ee/DATA/Repositories/PyHessian/hessian_analysis.py", line 70, in analyze
    top_eigenvalues, top_eigenvectors  = hessian_comp.eigenvalues(top_n=self.top_n)
  File "/media/ee/DATA/Repositories/PyHessian/pyhessian/hessian.py", line 167, in eigenvalues
    Hv = hessian_vector_product(self.gradsH, self.params, v)
  File "/media/ee/DATA/Repositories/PyHessian/pyhessian/utils.py", line 88, in hessian_vector_product
    retain_graph=True)
  File "/home/ee/anaconda3/envs/torch13/lib/python3.6/site-packages/torch/autograd/__init__.py", line 197, in grad
    grad_outputs_ = _make_grads(outputs, grad_outputs_)
  File "/home/ee/anaconda3/envs/torch13/lib/python3.6/site-packages/torch/autograd/__init__.py", line 32, in _make_grads
    if not out.shape == grad.shape:
AttributeError: 'float' object has no attribute 'shape'

Interestingly, the issue does not occur at the first call to back-propagation via loss.backward(), rather occurs at the call to torch.autograd.grad().

I believe that the float object in question is the 0. manually inserted when param.grad is None in the following routine:

https://github.com/amirgholami/PyHessian/blob/c2e49d2a735107a5d7ce2917d357d7a39b409fa4/pyhessian/utils.py#L61-L72

If I am right, it is even more mind-boggling that a type float is able to pass the check for data-type in PyTorch (I mistakenly mixed outputs and inputs arguments of torch.autograd.grad). Kindly guide about what I can do here.

P.S. hessian_analysis.py is a wrapper I wrote around the library, for my use-case. I verified the wrapper by running a 2-layer neural network for a regression task.

liujingcs commented 2 years ago

Hi, I have faced the same issue. Have you solved this issue?

stalhabukhari commented 2 years ago

@liujingcs Ah! It has been a long while. I think I upgraded the PyTorch version (probably 1.8).

If nothing works, you may want to check out this work by the same group: https://github.com/noahgolmant/pytorch-hessian-eigenthings

hubery1619 commented 1 year ago

Hi, I have faced the same issue. Have you solved this issue?

Hi, I wonder if you have solved this issue. Thanks so much.

yxiao54 commented 1 year ago

Hi guys, I meet the same issue and just figure it out. In my case, it's because there are some layers that have been defined in the model but didn't participate in forward or backward propagation. The issue was fixed after I delete the unused layers. Another way to resolve it is modifying the get_params_grad function in the utils of pyhessian library. When the grade is None, the grads should be a Tensor of zeros instead of float zeros.