In-context learning (ICL) in Large Language Models (LLMs) has emerged as apowerful new learning paradigm. However, its underlying mechanism is still notwell understood. In particular, it is challenging to map it to the "standard"machine learning framework, where one uses a training set $S$ to find abest-fitting function $f(x)$ in some hypothesis class. Here we make progress onthis problem by showing that the functions learned by ICL often have a verysimple structure: they correspond to the transformer LLM whose only inputs arethe query $x$ and a single "task vector" calculated from the training set.Thus, ICL can be seen as compressing $S$ into a single task vector$\boldsymbol{\theta}(S)$ and then using this task vector to modulate thetransformer to produce the output. We support the above claim via comprehensiveexperiments across a range of models and tasks.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)