AkihikoWatanabe commented 1 month ago

URL

https://arxiv.org/abs/2406.16838
Affiliations
- Sean Welleck, N/A
- Amanda Bertsch, N/A
- Matthew Finlayson, N/A
- Hailey Schoelkopf, N/A
- Alex Xie, N/A
- Graham Neubig, N/A
- Ilia Kulikov, N/A
- Zaid Harchaoui, N/A
  Abstract
- One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems.
  Translation (by gpt-4o-mini)
現代の大規模言語モデル（LLMs）に関する研究の中で最も注目すべき発見の一つは、トレーニング中に計算リソースを拡大することがより良い結果をもたらすということです。しかし、推論中に計算リソースを拡大することの利点にはあまり注目が集まっていません。この調査は、これらの推論時アプローチに焦点を当てています。私たちは、トークンレベルの生成アルゴリズム、メタ生成アルゴリズム、効率的生成の3つの領域を統一された数学的形式の下で探求します。トークンレベルの生成アルゴリズムは、しばしばデコーディングアルゴリズムと呼ばれ、1回に1つのトークンをサンプリングするか、トークンレベルの探索空間を構築して出力を選択します。これらの方法は通常、言語モデルのロジット、次トークンの分布、または確率スコアへのアクセスを前提としています。メタ生成アルゴリズムは部分的または完全なシーケンスで動作し、ドメイン知識を取り入れ、バックトラッキングを可能にし、外部情報を統合します。効率的生成手法は、トークンコストを削減し、生成の速度を向上させることを目指しています。私たちの調査は、従来の自然言語処理、現代のLLMs、機械学習システムの3つの研究コミュニティからの視点を統一しています。
Summary (by gpt-4o-mini)
推論時の計算リソース拡大の利点に焦点を当て、トークンレベル生成、メタ生成、効率的生成の3つのアプローチを統一的に探求。トークンレベル生成はデコーディングアルゴリズムを用い、メタ生成はドメイン知識や外部情報を活用し、効率的生成はコスト削減と速度向上を目指す。従来の自然言語処理、現代のLLMs、機械学習の視点を統合した調査。

AkihikoWatanabe commented 1 month ago

元ツイート: https://x.com/gneubig/status/1833522477605261799?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q

AkihikoWatanabe commented 1 month ago

CMUのチームによるinference timeの高速化に関するサーベイ

AkihikoWatanabe / paper_notes

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, Sean Welleck+, N/A, arXiv'24 #1386

URL

Affiliations

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)