Documenting this feature request from @mf-silva as supporting per-sample L2 gradient norms for LayerNorm allows estimating importance scores for data points on LLM architectures which often have LayerNorm.
A good starting point to implement this is to take a look at the custom first-order extension example in the docs.
Documenting this feature request from @mf-silva as supporting per-sample L2 gradient norms for
LayerNorm
allows estimating importance scores for data points on LLM architectures which often haveLayerNorm
. A good starting point to implement this is to take a look at the custom first-order extension example in the docs.