facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.1k stars 573 forks source link

[feat] Add DeepNorm/DeepNet residual path #227

Closed blefaudeux closed 2 years ago

blefaudeux commented 2 years ago

🚀 Feature

See https://arxiv.org/abs/2203.00555v1, combination of init + residual path The residual path is already modular in xformers, it should be possible to add this in a very clean way

Motivation

Seems better all around, worth testing it out and exposing the option

Pitch

Add another residual path definition on top of preLN/postLN

Alternatives

Not doing it

Additional context

Training stability issues are real, see

blefaudeux commented 2 years ago

cc @dianaml0 @fmassa