Closed berlino closed 1 week ago
Previously, groupnorm assumes that layernorm is applied on all the input axes apart from batch and group axes. With this PR, we can choose to apply either RMSNorm or LayerNorm along configurable axes of input tensors.
groupnorm
@ruomingp do you know why the checks are still not finished yet?
I'm not sure. Maybe @markblee knows?
Previously,
groupnorm
assumes that layernorm is applied on all the input axes apart from batch and group axes. With this PR, we can choose to apply either RMSNorm or LayerNorm along configurable axes of input tensors.