Closed le1nux closed 5 months ago
RMSNorm is more compute-efficient in comparison to LayerNorm, as explained in the original paper: https://openreview.net/pdf?id=SygkZ3MTJE
Due to its benefits, RMSNorm replaces LayerNorm also in LLama 2 for instance.
The llama 2 implementation can be found here: https://github.com/facebookresearch/llama/blob/a0a4da8b497c566403941ceec47c2512ecf9dd20/llama/model.py#L34
Implemented in #67.
RMSNorm is more compute-efficient in comparison to LayerNorm, as explained in the original paper: https://openreview.net/pdf?id=SygkZ3MTJE
Due to its benefits, RMSNorm replaces LayerNorm also in LLama 2 for instance.
The llama 2 implementation can be found here: https://github.com/facebookresearch/llama/blob/a0a4da8b497c566403941ceec47c2512ecf9dd20/llama/model.py#L34