URL

https://arxiv.org/abs/2311.12351
Affiliations
- Yunpeng Huang, N/A
- Jingwei Xu, N/A
- Zixu Jiang, N/A
- Junyu Lai, N/A
- Zenan Li, N/A
- Yuan Yao, N/A
- Taolue Chen, N/A
- Lijuan Yang, N/A
- Zhou Xin, N/A
- Xiaoxing Ma, N/A
  Abstract
- With the bomb ignited by ChatGPT, Transformer-based Large Language Models(LLMs) have paved a revolutionary path toward Artificial General Intelligence(AGI) and have been applied in diverse areas as knowledge bases, humaninterfaces, and dynamic agents. However, a prevailing limitation exists: manycurrent LLMs, constrained by resources, are primarily pre-trained on shortertexts, rendering them less effective for longer-context prompts, commonlyencountered in real-world settings. In this paper, we present a comprehensivesurvey focusing on the advancement of model architecture in Transformer-basedLLMs to optimize long-context capabilities across all stages from pre-trainingto inference. We firstly delineate and analyze the problems of handlinglong-context input and output with the current Transformer-based models. Then,we mainly offer a holistic taxonomy to navigate the landscape of Transformerupgrades on architecture to solve these problems. Afterward, we provide theinvestigation on wildly used evaluation necessities tailored for long-contextLLMs, including datasets, metrics, and baseline models, as well as some amazingoptimization toolkits like libraries, systems, and compilers to augment LLMs'efficiency and efficacy across different stages. Finally, we further discussthe predominant challenges and potential avenues for future research in thisdomain. Additionally, we have established a repository where we curate relevantliterature with real-time updates athttps://github.com/Strivin0311/long-llms-learning.
  Translation (by gpt-3.5-turbo)
ChatGPTによって引き起こされた爆弾を持って、Transformerベースの大規模言語モデル（LLMs）は、人工汎用知能（AGI）への革命的な道を開拓し、知識ベース、人間インターフェース、ダイナミックエージェントなど、さまざまな領域で応用されています。しかし、現在のLLMsの主な制約は、リソースの制約により、多くの現行のLLMsが主に短いテキストで事前学習されているため、実世界の設定でよく遭遇する長い文脈のプロンプトに対して効果が低下していることです。本論文では、TransformerベースのLLMsのモデルアーキテクチャの進化に焦点を当て、事前学習から推論までのすべての段階で長い文脈の能力を最適化するための包括的な調査を提案します。まず、現行のTransformerベースのモデルで長い文脈の入力と出力を処理する際の問題を明確化し、分析します。次に、これらの問題を解決するためにアーキテクチャのアップグレードのランドスケープをナビゲートするための包括的なタクソノミーを提供します。その後、長い文脈のLLMsに適した広く使用されている評価の必要性、データセット、メトリック、ベースラインモデルの調査を提供します。さらに、ライブラリ、システム、コンパイラなどの驚くべき最適化ツールキットを提供し、異なる段階でLLMsの効率と効果を向上させます。最後に、この領域における主要な課題と将来の研究の可能性についてさらに議論します。また、https://github.com/Strivin0311/long-llms-learningでリアルタイムの更新を行う関連文献をまとめたリポジトリを作成しました。
Summary (by gpt-3.5-turbo)
本論文では、Transformerベースの大規模言語モデル（LLMs）の長い文脈の能力を最適化するための包括的な調査を提案しています。現行のLLMsの制約や問題点を明確化し、アーキテクチャのアップグレードや評価の必要性について説明しています。さらに、最適化ツールキットや将来の研究の可能性についても議論しています。関連文献はhttps://github.com/Strivin0311/long-llms-learningでリアルタイムに更新されています。

AkihikoWatanabe / paper_notes

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey, Yunpeng Huang+, N/A, arXiv'23 #1164

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)