f-dangel / backpack

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.
https://backpack.pt/
MIT License
555 stars 55 forks source link

AttributeError: 'BatchNorm2d' object has no attribute 'output' #253

Open QiyaoWei opened 2 years ago

QiyaoWei commented 2 years ago

I post the full error below. The MWE is a bit long (currently hundreds of lines) and I am still working on it, but is there any specific direction I should be looking at given this error? It looks like Batchnorm is somehow mixed up in the gradient calculation (judging from the error message)?

Traceback (most recent call last):
  File "/Users/qiyaowei/DEQ-BNN/mwe.py", line 575, in <module>
    model(torch.rand(1,3,32,32)).sum().backward()
  File "/Users/qiyaowei/miniconda3/envs/jax/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/Users/qiyaowei/miniconda3/envs/jax/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/Users/qiyaowei/miniconda3/envs/jax/lib/python3.8/site-packages/torch/utils/hooks.py", line 110, in hook
    res = user_hook(self.module, grad_input, self.grad_outputs)
  File "/Users/qiyaowei/miniconda3/envs/jax/lib/python3.8/site-packages/backpack/__init__.py", line 209, in hook_run_extensions
    backpack_extension(module, g_inp, g_out)
  File "/Users/qiyaowei/miniconda3/envs/jax/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py", line 127, in __call__
    module_extension(self, module, g_inp, g_out)
  File "/Users/qiyaowei/miniconda3/envs/jax/lib/python3.8/site-packages/backpack/extensions/module_extension.py", line 97, in __call__
    delete_old_quantities = not self.__should_retain_backproped_quantities(module)
  File "/Users/qiyaowei/miniconda3/envs/jax/lib/python3.8/site-packages/backpack/extensions/module_extension.py", line 162, in __should_retain_backproped_quantities
    is_a_leaf = module.output.grad_fn is None
  File "/Users/qiyaowei/miniconda3/envs/jax/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'BatchNorm2d' object has no attribute 'output'
fKunstner commented 2 years ago

Hi Qiyao,

Beware of BatchNorm; most of the quantities returned by BackPACK are not defined when there's a batchnorm layer in the middle (see e.g. https://github.com/f-dangel/backpack/issues/239).

Easy checks of things that can cause something like this to happen would be calling backward twice (where the first backward clears the graph, and the second backward then crashes), and maybe missing a call to backpack.extend(model). But this doesn't seem to be the case here.

Only a rough guess looking at the stack, but the error might be specific to BatchNorm. The error occurs after the computation of the backward pass, during cleanup. delete_old_quantities = not self.__should_retain_backproped_quantities(module). The error 'BatchNorm2d' object has no attribute 'output' indicates that the extension needed to store additional quantities during the forward pass (the output of the layer) but did not. This is weird; I would expect it to crash much earlier. What extension are you running with batchnorm?

QiyaoWei commented 2 years ago
  1. hmmm, is there currently an alternative to BatchNorm? I guess it would be safest to just stick to linear and conv layers + activations, although the accuracy will for sure decrease in that case
  2. Yeah I don't think I am calling backward twice, and I made sure to add model = extend(model). BTW, the documentation page also recommended trying use_converter=True, but I guess that one has its own bugs so I did not dig deeper.
  3. Even though I don't have the full MWE ready, the error code is easy to share
    model = get_cls_net()
    model = extend(model)
    with backpack(BatchGrad()):
    model(torch.rand(1,3,32,32)).sum().backward()

    The weird thing is, BatchNorm worked with this code when I was trying it on a smaller model, so what I am doing right now is trying to sort out the structural differences between these two models and see if I can find anything useful

fKunstner commented 2 years ago

is there currently an alternative to BatchNorm?

There are, for example GroupNorm or LayerNorm (see https://pytorch.org/docs/stable/nn.html#normalization-layers). The problem with BatchNorm is that there are no "individual gradient"; it is not possible to isolate the contribution of one sample to the loss because BatchNorm mixes them.

What's the model (get_cls_net)?

QiyaoWei commented 2 years ago

Oh I thought backpack doesn't support GroupNorm

BTW I might have figured out the issue, it goes away when I do add an eval like: extend(model).eval(). Not sure why but I guess that is a fix!