f-dangel / backpack

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.
https://backpack.pt/
MIT License
555 stars 55 forks source link

[not confirmed] `AttributeError` related to BackPACK's IO #212

Closed littleolex closed 3 years ago

littleolex commented 3 years ago

AttributeError: 'Conv2d' object has no attribute 'input0'

f-dangel commented 3 years ago

Hi, thanks for reaching out.

Could you provide more details how you obtained this error? For instance, a small code snippet that reproduces the issue.

Best

littleolex commented 3 years ago

image

littleolex commented 3 years ago

I use torch 1.8.0

f-dangel commented 3 years ago

The error is caused by one of BackPACK's extensions that tries to access information stored from a forward pass. From the traceback you posted, it is clear where the problem occurs. But to track down the issue, a functioning piece of code that runs into exactly this error will be required.

littleolex commented 3 years ago

I want to ask a question , if the loss of a example must calculate with the whole batch , then the model outputs a loss ,which is a tensor whose size is the batchsize , that means loss[0] is the loss of the first example , then can I still use your method to get the individual gradient ?

f-dangel commented 3 years ago

If you want to compute gradients for loss[0], loss[1], ... in parallel, try


# ... extend the model, do a forward pass -> loss

# compute the gradients of `loss[0], loss[1], ...`
with backpack(BatchGrad()):
    # loss has shape [N]
    loss.sum(0).backward()
littleolex commented 3 years ago

then ,how can I get the individual gradient , still same with the tutorial? Another a question is I didn't find to function to replace the BatchNorm layer , but I heard it really exits, can you tell me how to do it ?

littleolex commented 3 years ago

I find a strange thing,When I use you optimizer to update param , even when I didn't clip gradient and add noise , I can not reach the same accuracy as original optimizer.

littleolex commented 3 years ago

scaling_factors = torch.clamp_max(l2_norms / C, 1.0) clipped_grads = p.grad_batch * make_broadcastable(scaling_factors, p.grad_batch) these two sentences shows :if the gradient'norm < C,then the gradient will multiply with a number which is smaller than 1.That seems wrong.

schaefertim commented 3 years ago

Hi, I am also working on BackPACK and I'd like to help you. However, I have two difficulties:

I suggest that you ask your open questions in this form:

  1. How do I resolve this error: YourError?
    • error message
    • code that reproduces the error
    • maybe purpose of code
  2. How can I compute the following?
    • description what you want to compute
    • what you tried
  3. ...
littleolex commented 3 years ago

Thanks so much , but I have found another way to achieve gradient clipping.

f-dangel commented 3 years ago

I will go ahead and close this.