f-dangel / backpack

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.
https://backpack.pt/
MIT License
555 stars 55 forks source link

Support for LayerNorm #263

Open Niccolo-Ajroldi opened 1 year ago

Niccolo-Ajroldi commented 1 year ago

I was trying to extend a Vision Transformer model using backpack. However, I encounter the following error:

UserWarning: Extension saving to grad_batch does not have an extension for Module <class 'torch.nn.modules.normalization.LayerNorm'> although the module has parameters

I know that torch.nn.BatchNormNd leads to ill-defined first-order quantities and hence it is not implemented here. Does the same hold for Layer Normalization?

Thank you in advance!

f-dangel commented 1 year ago

Hi,

thanks for your question. The exception you get for LayerNorm is because BackPACK currently does not support it.

In contrast to BatchNorm however, this layer treats each sample in a mini-batch independently (mean and variance for the normalization for a sample are computed along its feature dimensions; for BN they are computed along the batch dimension). Hence, first-order quantities like individual gradients are defined.

To add support for LayerNorm, the following example from the documentation is a good starting point. It describes how to write BackPACK extensions for new layers (the "Custom module extension" is the most relevant).

I'd be happy to help merging a PR.

Best, Felix

KOVVURISATYANARAYANAREDDY commented 1 year ago

Any update on this?

f-dangel commented 1 year ago

No progress, and I don't have capacities to work on this feature.

To break things further down, adding limited support for LayerNorm, e.g. only the BatchGrad extension, would be a feasible starting point. This can be achieved by following the above example in the docs.