AkihikoWatanabe commented 6 months ago

URL

https://arxiv.org/abs/2402.17764
Affiliations
- Shuming Ma, N/A
- Hongyu Wang, N/A
- Lingxiao Ma, N/A
- Lei Wang, N/A
- Wenhui Wang, N/A
- Shaohan Huang, N/A
- Li Dong, N/A
- Ruiping Wang, N/A
- Jilong Xue, N/A
- Furu Wei, N/A
  Abstract
- Recent research, such as BitNet, is paving the way for a new era of 1-bitLarge Language Models (LLMs). In this work, we introduce a 1-bit LLM variant,namely BitNet b1.58, in which every single parameter (or weight) of the LLM isternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16)Transformer LLM with the same model size and training tokens in terms of bothperplexity and end-task performance, while being significantly morecost-effective in terms of latency, memory, throughput, and energy consumption.More profoundly, the 1.58-bit LLM defines a new scaling law and recipe fortraining new generations of LLMs that are both high-performance andcost-effective. Furthermore, it enables a new computation paradigm and opensthe door for designing specific hardware optimized for 1-bit LLMs.
  Translation (by gpt-3.5-turbo)
最近の研究、例えばBitNetなど、新しい1ビットの大規模言語モデル（LLMs）の時代の道を切り開いています。本研究では、1ビットLLMの変種であるBitNet b1.58を紹介します。このモデルでは、LLMの各パラメータ（または重み）が三値{-1, 0, 1}である。このモデルは、パープレキシティとエンドタスクのパフォーマンスの両方において、モデルサイズとトレーニングトークンが同じであるフルプレシジョン（すなわち、FP16またはBF16）のTransformer LLMと一致し、同時にレイテンシ、メモリ、スループット、およびエネルギー消費の面で著しくコスト効果が高いです。さらに、1.58ビットのLLMは、高性能かつコスト効果が高い新世代のLLMのトレーニングのための新しいスケーリング法とレシピを定義します。さらに、これは新しい計算パラダイムを可能にし、1ビットLLMに最適化された特定のハードウェアの設計の扉を開きます。
Summary (by gpt-3.5-turbo)
最新の研究では、1ビットの大規模言語モデル（LLMs）の時代が到来しており、BitNetなどの研究がその道を切り開いている。本研究では、1ビットLLMの変種であるBitNet b1.58を紹介し、その性能や効率について述べている。このモデルは、三値{-1, 0, 1}で各パラメータを表現し、フルプレシジョンのTransformer LLMと同等の性能を示す一方、コスト効果が高いことが特徴である。1.58ビットのLLMは、新しいスケーリング法やレシピを提供し、新しい計算パラダイムを可能にするとともに、特定のハードウェアの設計にも貢献する。

AkihikoWatanabe commented 6 months ago

1bit量子化を実現したBitNet。乗算が不要になるからGPU以外のアーキテクチャが最適かもね、みたいな話らしい。おまけに性能も高いらしい。（論文まだ読んでない） Github: https://github.com/kyegomez/BitNet

AkihikoWatanabe commented 6 months ago

AkihikoWatanabe / paper_notes

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, Shuming Ma+, N/A, arXiv'24 #1240

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)