Feature request: `BatchL2Grad` for `LayerNorm`

f-dangel / backpack

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.

https://backpack.pt/

MIT License

549 stars 55 forks source link

Feature request: `BatchL2Grad` for `LayerNorm` #327

Open f-dangel opened 2 months ago

f-dangel commented 2 months ago

Documenting this feature request from @mf-silva as supporting per-sample L2 gradient norms for LayerNorm allows estimating importance scores for data points on LLM architectures which often have LayerNorm. A good starting point to implement this is to take a look at the custom first-order extension example in the docs.