See https://arxiv.org/abs/2203.00555v1, combination of init + residual path
The residual path is already modular in xformers, it should be possible to add this in a very clean way
Motivation
Seems better all around, worth testing it out and exposing the option
Pitch
Add another residual path definition on top of preLN/postLN
🚀 Feature
See https://arxiv.org/abs/2203.00555v1, combination of init + residual path The residual path is already modular in xformers, it should be possible to add this in a very clean way
Motivation
Seems better all around, worth testing it out and exposing the option
Pitch
Add another residual path definition on top of preLN/postLN
Alternatives
Not doing it
Additional context
Training stability issues are real, see