URL

https://arxiv.org/abs/2410.01131
Affiliations
- Ilya Loshchilov, N/A
- Cheng-Ping Hsieh, N/A
- Simeng Sun, N/A
- Boris Ginsburg, N/A
  Abstract
- We propose a novel neural network architecture, the normalized Transformer (nGPT) with representation learning on the hypersphere. In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normalized. The input stream of tokens travels on the surface of a hypersphere, with each layer contributing a displacement towards the target output predictions. These displacements are defined by the MLP and attention blocks, whose vector components also reside on the same hypersphere. Experiments show that nGPT learns much faster, reducing the number of training steps required to achieve the same accuracy by a factor of 4 to 20, depending on the sequence length.
  Translation (by gpt-4o-mini)
新しいニューラルネットワークアーキテクチャである正規化トランスフォーマー（nGPT）を提案します。このアーキテクチャでは、ハイパースフィア上での表現学習が行われます。nGPTでは、埋め込み、MLP、アテンション行列、隠れ状態を形成するすべてのベクトルが単位ノルムで正規化されています。トークンの入力ストリームはハイパースフィアの表面上を移動し、各層がターゲット出力予測に向けての変位を寄与します。これらの変位はMLPとアテンションブロックによって定義され、これらのベクトル成分も同じハイパースフィア上に存在します。実験結果は、nGPTが非常に速く学習し、同じ精度を達成するために必要なトレーニングステップの数を4倍から20倍に削減することを示しています。
Summary (by gpt-4o-mini)
新しいアーキテクチャ「正規化トランスフォーマー（nGPT）」を提案。すべてのベクトルが単位ノルムで正規化され、トークンはハイパースフィア上で移動。nGPTはMLPとアテンションブロックを用いて出力予測に寄与し、学習速度が向上し、必要なトレーニングステップを4倍から20倍削減。

AkihikoWatanabe / paper_notes

nGPT: Normalized Transformer with Representation Learning on the Hypersphere, Ilya Loshchilov+, N/A, arXiv'24 #1465

URL

Affiliations

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)