TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models
https://transformerlensorg.github.io/TransformerLens/
MIT License
1.39k stars 269 forks source link

[Bug Report] Pythia models / Rotary Embeddings don't match Huggingface. #385

Open UFO-101 opened 11 months ago

UFO-101 commented 11 months ago

Describe the bug Pythia model outputs don't exactly match the Huggingface Transformers implementation.

Code example

def check_similarity_with_hf_model(tl_model, hf_model, atol, prompt="Hello, world!"):
    tokens = tl_model.tokenizer.encode(prompt, return_tensors="pt")
    logits = tl_model(tokens, prepend_bos=False)
    hf_logits = hf_model(tokens).logits
    assert torch.allclose(t.softmax(logits, dim=-1), t.softmax(hf_logits, dim=-1), atol=atol)

model_name = "EleutherAI/pythia-70m"
tl_model = HookedTransformer.from_pretrained(model_name)
hf_model = AutoModelForCausalLM.from_pretrained(model_name)
check_similarity_with_hf_model(tl_model, hf_model, atol=1e-5)

This fails with model_name = "EleutherAI/pythia-70m", but passes with every other model I tried. It passes with pythia-70m if I set atol=0.1. Arthur says it works for him with atol=1e-3.

System Info Describe the characteristic of your environment:

Additional context See discussion in Open Source Mechanistic Interpretability Slack here: https://opensourcemechanistic.slack.com/archives/C04SRRE96UV/p1695593544494209

Checklist

jettjaniak commented 11 months ago

I can confirm the finding on MacOS, TL 1.6.1, Python 3.9.6, PyTorch 2.0.1 - both on cpu and mps and for all small pythia models (14, 31, 70m).

The relevant HF line is https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_neox/modeling_gpt_neox.py#L182 I went there with a debugger and compared the values to hook_rot_k / hook_rot_q for a few positions and a few heads in layer 0 on pythia-14m. There is no difference, I don't think RoPE are the issue.

UFO-101 commented 11 months ago

Oh ok good spot, thank you!

ArthurConmy commented 11 months ago

For me (on a GPU and CPU) even GPT-2 attention patterns fails torch.testing.assert_allclose with rtol=atol=1e-6. Though yeah 1e-3 is pretty bad. @UFO-101 you're using MacOS, are you using CPU or MPS?

UFO-101 commented 11 months ago

CPU

jbloomAus commented 11 months ago

I've assigned myself to this as I've started trying to debug this. It seems like the most reasonable culprit is deviation in calculate_sin_cos_rotary but fixing that doesn't fix compounding deviation as layers increase. Will put updates here when I know more.

ed1d1a8d commented 10 months ago

This issue also exists with Llama-7b-hf-chat and it is really bad. The following code fails even with atol=1 (it passes with atol=2). Code:

import torch
from transformer_lens import HookedTransformer
from transformers import AutoModelForCausalLM, AutoTokenizer

def check_similarity_with_hf_model(
    tl_model: HookedTransformer,
    hf_model: AutoModelForCausalLM,
    atol: float,
    prompt="Hello world!",
):
    tokens = tl_model.tokenizer.encode(prompt, return_tensors="pt").cuda()
    tl_logits = tl_model(tokens, prepend_bos=False)
    hf_logits = hf_model(tokens).logits
    assert torch.allclose(tl_logits, hf_logits, atol=atol)

MODEL_NAME = "meta-llama/Llama-2-7b-chat-hf"
hf_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float32,
).cuda()
tl_model = HookedTransformer.from_pretrained(
    MODEL_NAME,
    hf_model=AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        low_cpu_mem_usage=True,
        torch_dtype=torch.float32,
    ),
    tokenizer=AutoTokenizer.from_pretrained(MODEL_NAME),
    device="cuda",
    n_devices=1,
    move_to_device=True,
    fold_ln=False,
    fold_value_biases=False,
    center_writing_weights=False,
    center_unembed=False,
    torch_dtype=torch.float32,
)

with torch.no_grad():
    check_similarity_with_hf_model(tl_model, hf_model, atol=1)
yuxili19 commented 9 months ago

This issue also exists with Llama-7b-hf-chat and it is really bad. The following code fails even with atol=1 (it passes with atol=2). Code:

import torch
from transformer_lens import HookedTransformer
from transformers import AutoModelForCausalLM, AutoTokenizer

def check_similarity_with_hf_model(
    tl_model: HookedTransformer,
    hf_model: AutoModelForCausalLM,
    atol: float,
    prompt="Hello world!",
):
    tokens = tl_model.tokenizer.encode(prompt, return_tensors="pt").cuda()
    tl_logits = tl_model(tokens, prepend_bos=False)
    hf_logits = hf_model(tokens).logits
    assert torch.allclose(tl_logits, hf_logits, atol=atol)

MODEL_NAME = "meta-llama/Llama-2-7b-chat-hf"
hf_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float32,
).cuda()
tl_model = HookedTransformer.from_pretrained(
    MODEL_NAME,
    hf_model=AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        low_cpu_mem_usage=True,
        torch_dtype=torch.float32,
    ),
    tokenizer=AutoTokenizer.from_pretrained(MODEL_NAME),
    device="cuda",
    n_devices=1,
    move_to_device=True,
    fold_ln=False,
    fold_value_biases=False,
    center_writing_weights=False,
    center_unembed=False,
    torch_dtype=torch.float32,
)

with torch.no_grad():
    check_similarity_with_hf_model(tl_model, hf_model, atol=1)

@ed1d1a8d I've reached the same problem, have you solved it already?

ed1d1a8d commented 9 months ago

Have not solved it unfortunately. Though I didn't spend that much more time looking into the issue.

nix-apollo commented 9 months ago

I think I found at least one part of what is going wrong. Have a look at the attention scores of head 5.0 on this example:

from transformer_lens import utils, HookedTransformer

model_name = "EleutherAI/pythia-70m"
tl_model = HookedTransformer.from_pretrained_no_processing(model_name)

string = "Hello, world! Pad Pad Pad"
tokens = tl_model.to_tokens(string)
logits, cache = tl_model.run_with_cache(tokens, prepend_bos=False)

cache[utils.get_act_name("attn_scores", 5)][0, 2, 4]

Outputs tensor([-110208.0547, -110189.8125, -110206.5234, -110208.4531, -110192.6016, -100000.0000, -100000.0000, -100000.0000], device='cuda:0') That is, the attention head is generating a score smaller than negative 1e5 for the positions it should be allowed to pay attention to. The way the causal mask works is we just subtract 1e5 from all the masked which should lead to them getting 0 attention. Here it is instead leading them to getting more attention.

Pythia uses flash attention which I don't know much about. My impression is it implements the causal mask in a different way. Empirically if you look at the attention probabilities after softmax, tranformer lens is paying attention to forbidden positions while the hugging face model is not.

transformer lens: [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3333, 0.3333, 0.3333] hugging face [1.1518e-08, 9.4122e-01, 5.3260e-08, 7.6729e-09, 5.8777e-02, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]

One follow up question is if the attention scores for the hugging face model are actually this small, or if transformer lens gets these very small scores from some other place. I wasn't able to get the answer to this immediately and need to go. I'll investigate more later if someone hasn't solved before then.

nix-apollo commented 9 months ago

An easy test to run: add a hook in the attention pattern calculation for pythia that centers the attention scores for every query position (before the mask). As softmax is scale invariant this shouldn't change anything, other than avoiding the above bug. See if this fixes the comparison errors with huggingface.

ArthurConmy commented 9 months ago

@nix-apollo which version of TL are you on? We replaced the -1e5 issue here and here

ArthurConmy commented 9 months ago

Another bit of evidence: @bryce13950 states the current failing CI test is due to a Pythia model:

The model specifically that is causing this fail is the model "EleutherAI/pythia-70m", if that model is removed from line 209 of tests/acceptance/test_hooked_transformer.py the CI then passes.

bryce13950 commented 9 months ago

For the purpose of the CI failing, I did modify the margin from 5e-5 to 5e-4, and that now passes. I am not sure if this is something we want to keep, but I did submit a pull request https://github.com/neelnanda-io/TransformerLens/pull/451 with this change in case we want to accept slightly less accuracy for now in order to allow the CI to not block unrelated additions.

ArthurConmy commented 9 months ago

@ed1d1a8d ^the egregious Llama-2 errors are fixed, we think! Now 1e-4 errors, only.

We are working on the dull task of porting on TL functions to match HF exactly which should resolve all further issues, but we haven't finished this yet.

ed1d1a8d commented 9 months ago

Woah thank you so much for taking the time to help resolve this issue 🙏 And I am very excited for exact matching!