With the bomb ignited by ChatGPT, Transformer-based Large Language Models(LLMs) have paved a revolutionary path toward Artificial General Intelligence(AGI) and have been applied in diverse areas as knowledge bases, humaninterfaces, and dynamic agents. However, a prevailing limitation exists: manycurrent LLMs, constrained by resources, are primarily pre-trained on shortertexts, rendering them less effective for longer-context prompts, commonlyencountered in real-world settings. In this paper, we present a comprehensivesurvey focusing on the advancement of model architecture in Transformer-basedLLMs to optimize long-context capabilities across all stages from pre-trainingto inference. We firstly delineate and analyze the problems of handlinglong-context input and output with the current Transformer-based models. Then,we mainly offer a holistic taxonomy to navigate the landscape of Transformerupgrades on architecture to solve these problems. Afterward, we provide theinvestigation on wildly used evaluation necessities tailored for long-contextLLMs, including datasets, metrics, and baseline models, as well as some amazingoptimization toolkits like libraries, systems, and compilers to augment LLMs'efficiency and efficacy across different stages. Finally, we further discussthe predominant challenges and potential avenues for future research in thisdomain. Additionally, we have established a repository where we curate relevantliterature with real-time updates athttps://github.com/Strivin0311/long-llms-learning.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)