Closed iBibek closed 3 weeks ago
Hi,
Let me try to answer your questions:
The precomputed latents are the latent representations of the last token after each transformer block. The purpose is to save computational time, since Llama-2 forward passes are slow.
The steering experiment goes beyond the paper but is cute which is why included it into one of the colab notebooks.
No, the main code required to reproduce the results from the paper is contained in the ipynb files in this repository. The colab link provided is just nice to explore some prompts by hand and inspect their logit lens decodings.
Best, Chris
Thank you so much for clearing this. :)
Regarding steering experiment, can you please provide some resources that I could refer to?
I would appreciate it a a lot.
We are basically following a strategy similar to https://arxiv.org/abs/2312.06681
Just our case we can create paired prompts that only differ only in language very easily and don't even need the (A)/(B) prompt structure that they used.
Here are some of my thoughts on the topic of steering:
Superposition theory (toy models, monosemanticity) suggests that neural networks represent features as vectors (e.g., of neuron activations; or of the stuff that is in the residual stream).
Let's suppose this holds. A latent at layer $i$ takes the form $$z = \sum_{j} \alpha_j f_j,$$ with $\alpha_j \in \mathbb{R}$ and $f_j \in \mathbb{R}^d$.
Given this linear representation, a representation leveraging an abstract concept space for dealing with, e.g., multilingual data, could look like this: $$z = z{\text{concept}} + z{\text{decoding language}} + z_{\text{rest}}.$$
Now, if we had a method to compute $z{\text{decoding language}}$ or $\triangle = z{\text{target language}} - z{\text{source language}}$, we could change the output language by the following intervention: $$z' = z - z{\text{source language}} + z_{\text{target language}} = z + \triangle.$$
Let's consider, e.g., $\ell_1 = \text{RU}$ as source language and $\ell2 = \text{ZH}$ as target language and the following simplified model $$z = z{\text{target language}} + z{\text{rest}},$$ with $z{\text{rest}} \sim N(0, \sigma)$.
We can estimate $z{\ell}$ using a dataset of latents $D{\ell}$, with $|D{\ell}| = n$ that all share the feature $z{\ell} \in \mathbb{R}^d$:
$$z{\ell} \approx \frac{1}{n}\sum{z \in D{\ell}} z = z{\ell} + \underbrace{\frac{1}{n} \sum{k} z{rk}}{\approx 0}.$$
We can drop the assumption $z{\text{rest}} \sim N(0, \sigma)$ by observing $$\mu = \frac{1}{n} \sum{z \in D{\ell}} z = z{\ell} + \mur,$$ since $z{\ell}$ is shared among all examples. As a result, we can compute $\triangle$ by computing the difference $$\triangle = \mu_2 - \mu1 = \mu{r} + z_{\ell2} - \mu{r} - z_{\ell1} = z{\ell2} - z{\ell_1}.$$
Thank you so much for these resources. 🙌
Happy to help!
Thank you for sharing your code.
I am trying to understand few things in here. I have mentioned three questions. As mentioned in the repo :
For your convenience, we also provide some precomputed latents on [huggingface.](https://huggingface.co/datasets/wendlerc/llm-latent-language) Here are some [preliminary steering experiments](https://colab.research.google.com/drive/1EhCk3_CZ_nSfxxpaDrjTvM-0oHfN9m2n?usp=sharing) using the precomputed latents.
What are those precomputed latents? and What are their purpose?
The steering experiment as provided in the Colab, what is that doing ? I could not find the details in the paper.
The Colab link provided ( https://colab.research.google.com/drive/1l6qN-hmCV4TbTcRZB5o6rUk_QPHBZb7K?usp=sharing ), is it the main code to visualize the pivot from English language?
Thank you
Huggingface repo: wendlerc/llm-latent-language