URL

https://arxiv.org/abs/2311.00871
Affiliations
- Steve Yadlowsky, N/A
- Lyric Doshi, N/A
- Nilesh Tripuraneni, N/A
  Abstract
- Transformer models, notably large language models (LLMs), have the remarkableability to perform in-context learning (ICL) -- to perform new tasks whenprompted with unseen input-output examples without any explicit model training.In this work, we study how effectively transformers can bridge between theirpretraining data mixture, comprised of multiple distinct task families, toidentify and learn new tasks in-context which are both inside and outside thepretraining distribution. Building on previous work, we investigate thisquestion in a controlled setting, where we study transformer models trained onsequences of $(x, f(x))$ pairs rather than natural language. Our empiricalresults show transformers demonstrate near-optimal unsupervised model selectioncapabilities, in their ability to first in-context identify different taskfamilies and in-context learn within them when the task families arewell-represented in their pretraining data. However when presented with tasksor functions which are out-of-domain of their pretraining data, we demonstratevarious failure modes of transformers and degradation of their generalizationfor even simple extrapolation tasks. Together our results highlight that theimpressive ICL abilities of high-capacity sequence models may be more closelytied to the coverage of their pretraining data mixtures than inductive biasesthat create fundamental generalization capabilities.
  Translation (by gpt-3.5-turbo)
トランスフォーマーモデル、特に大規模な言語モデル（LLMs）は、未知の入出力例を提示された際に明示的なモデルトレーニングなしで新しいタスクを実行するという、驚くべき文脈学習（ICL）の能力を持っています。本研究では、トランスフォーマーモデルが事前学習データの混合物を介して異なるタスクを特定し、文脈内で学習する能力をどれだけ効果的に持っているかを調査しました。また、この能力が事前学習データの範囲内および範囲外の新しいタスクを学習する際にどのように機能するかも調査しました。以前の研究を基に、自然言語ではなく$(x, f(x))$のペアのシーケンスでトレーニングされたトランスフォーマーモデルを用いて、制御された環境でこの問題を検討しました。実証結果は、トランスフォーマーモデルが非監督学習モデル選択能力においてほぼ最適な性能を示すことを示しています。つまり、事前学習データに十分に表現されている場合、まず異なるタスクファミリーを文脈内で特定し、それらのタスクファミリー内で学習する能力を持っています。しかし、事前学習データのドメイン外のタスクや関数が与えられた場合、トランスフォーマーモデルのさまざまな失敗モードや一般化の劣化が示されます。さらに、私たちの結果は、高容量のシーケンスモデルの印象的なICL能力が、基本的な一般化能力を生み出す帰納的なバイアスよりも、事前学習データの範囲により密接に関連していることを強調しています。
Summary (by gpt-3.5-turbo)
本研究では、トランスフォーマーモデルの文脈学習（ICL）能力を調査しました。トランスフォーマーモデルは、事前学習データの範囲内で異なるタスクを特定し、学習する能力を持っています。しかし、事前学習データの範囲外のタスクや関数に対しては一般化が劣化することが示されました。また、高容量のシーケンスモデルのICL能力は、事前学習データの範囲に密接に関連していることが強調されました。

AkihikoWatanabe / paper_notes

Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models, Steve Yadlowsky+, N/A, arXiv'23 #1117

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)