bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

Are there any other layer norm functions, such as RMSNorm or DeepNorm #364

Open lvcc2018 opened 1 year ago

lvcc2018 commented 1 year ago

Are there any other layer norm functions, such as RMSNorm or DeepNorm