Large pre-trained models are capable of few-shot in-context learning (ICL), i.e., performing a new task by prepending a few demonstrations before the test input. However, the concatenated demonstrations are often excessively long and induce additional computation. Inspired by fusion-in-decoder (FiD) models which efficiently aggregate more passages and thus outperforms concatenation-based models in open-domain QA, we hypothesize that similar techniques can be applied to improve the efficiency and end-task performance of ICL. To verify this, we present a comprehensive study on applying three fusion methods—concatenation-based (early fusion), FiD (intermediate), and ensemble-based (late)—to ICL. We adopt a meta-learning setup where a model is first trained to perform ICL on a mixture of tasks using one selected fusion method, then evaluated on held-out tasks for ICL. Results on 11 held-out tasks show that FiD-ICL matches or outperforms the other two fusion methods. Additionally, we show that FiD-ICL (1) is 10x faster at inference time compared to concat-based and ensemble-based ICL, as we can easily pre-compute the representations of in-context examples and reuse them; (2) enables scaling up to meta-training 3B-sized models, which would fail for concat-based ICL.

Translation (by gpt-3.5-turbo)

大規模な事前学習モデルは、少数のデモンストレーションをテスト入力の前に追加することで新しいタスクを実行することができるfew-shot in-context learning（ICL）が可能です。しかし、連結されたデモンストレーションはしばしば過度に長く、追加の計算を引き起こします。オープンドメインのQAでは、より多くのパッセージを効率的に集約することができるfusion-in-decoder（FiD）モデルに着想を得て、同様の技術がICLの効率とエンドタスクのパフォーマンスを向上させるために適用できると仮説を立てます。これを検証するために、ICLに対して連結ベース（early fusion）、FiD（intermediate）、アンサンブルベース（late）の3つのフュージョン手法を適用した包括的な研究を行います。我々はメタラーニングのセットアップを採用し、モデルを選択したフュージョン手法を使用して複数のタスクの混合でICLを実行するように最初にトレーニングし、その後、ICLのための保持されたタスクで評価します。11の保持されたタスクの結果は、FiD-ICLが他の2つのフュージョン手法と同等または優れたパフォーマンスを示すことを示しています。さらに、FiD-ICLは（1）concat-basedおよびensemble-based ICLと比較して推論時間が10倍速くなります。インコンテキストの例の表現を簡単に事前計算して再利用できるためです。また、（2）concat-based ICLでは失敗する3Bサイズのメタトレーニングモデルのスケーリングを可能にします。
Summary (by gpt-3.5-turbo)
大規模な事前学習モデルを使用したfew-shot in-context learning（ICL）において、fusion-in-decoder（FiD）モデルを適用することで効率とパフォーマンスを向上させることができることを検証する。FiD-ICLは他のフュージョン手法と比較して優れたパフォーマンスを示し、推論時間も10倍速くなる。また、FiD-ICLは大規模なメタトレーニングモデルのスケーリングも可能にする。

AkihikoWatanabe / paper_notes

FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning, ACL'23 #817

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)