AkihikoWatanabe commented 1 year ago

URL

https://arxiv.org/abs/2310.15916
Affiliations
- Roee Hendel, N/A
- Mor Geva, N/A
- Amir Globerson, N/A
  Abstract
- In-context learning (ICL) in Large Language Models (LLMs) has emerged as apowerful new learning paradigm. However, its underlying mechanism is still notwell understood. In particular, it is challenging to map it to the "standard"machine learning framework, where one uses a training set $S$ to find abest-fitting function $f(x)$ in some hypothesis class. Here we make progress onthis problem by showing that the functions learned by ICL often have a verysimple structure: they correspond to the transformer LLM whose only inputs arethe query $x$ and a single "task vector" calculated from the training set.Thus, ICL can be seen as compressing $S$ into a single task vector$\boldsymbol{\theta}(S)$ and then using this task vector to modulate thetransformer to produce the output. We support the above claim via comprehensiveexperiments across a range of models and tasks.
  Translation (by gpt-3.5-turbo)
大規模言語モデル（LLMs）におけるインコンテキスト学習（ICL）は、強力な新しい学習パラダイムとして登場しています。しかし、その基本的なメカニズムはまだ十分に理解されていません。特に、ICLを「標準的な」機械学習フレームワークにマッピングすることは困難であり、トレーニングセット$S$を使用して仮説クラス内の最適な適合関数$f(x)$を見つけるというフレームワークには適用できません。本研究では、ICLによって学習される関数が非常に単純な構造を持つことを示すことで、この問題に進展をもたらします。ICLによって学習される関数は、トランスフォーマーLLMであり、その入力はクエリ$x$とトレーニングセットから計算される単一の「タスクベクトル」のみです。したがって、ICLは$S$を単一のタスクベクトル$\boldsymbol{\theta}(S)$に圧縮し、このタスクベクトルを使用してトランスフォーマーを調整して出力を生成すると見なすことができます。我々は、さまざまなモデルとタスクにわたる包括的な実験によって上記の主張を支持しています。
Summary (by gpt-3.5-turbo)
大規模言語モデル（LLMs）におけるインコンテキスト学習（ICL）の基本的なメカニズムはまだ十分に理解されていない。本研究では、ICLによって学習される関数が非常に単純な構造を持つことを示し、ICLがトランスフォーマーLLMを使用して単一のタスクベクトルを生成し、それを使用して出力を生成するということを明らかにする。さまざまなモデルとタスクにわたる実験によって、この主張を支持している。

AkihikoWatanabe commented 1 year ago

参考: https://x.com/hillbig/status/1717302086587875395?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q

AkihikoWatanabe commented 1 year ago

ICLが実現可能なのは実はネットワーク内部で与えられたdemonstrationに対して勾配効果法を再現しているからです、という研究もあったと思うけど、このタスクベクトルとの関係性はどういうものなのだろうか。

AkihikoWatanabe commented 1 year ago

文脈に注意を与えなくてもICLと同じ性能が出るのは、文脈情報が不要なタスクを実施しているからであり、そうではないタスクだとこの知見が崩れるのだろうか。後で読む。

AkihikoWatanabe / paper_notes

In-Context Learning Creates Task Vectors, Roee Hendel+, N/A, EMNLP'23 #1090

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)