TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models
https://transformerlensorg.github.io/TransformerLens/
MIT License
1.55k stars 301 forks source link

Add a helper function to display vectors of logits nicely #112

Open neelnanda-io opened 1 year ago

neelnanda-io commented 1 year ago

Often you want to look at vectors over the vocabulary (eg the logits at a specific position). This is >50,000 dimensions and this is hard to interpret! I want there to be nice utils to visualize a vector like this.

An MVP would be a function mapping this to a pandas dataframe, with the token index, token string value, logit, log prob and probability. Either for just the top K, or for the entire vocab.

But I expect there's many ways to make something nice here! One option is to imitate nostalgebraist's graphing style for plot_logit_lens in `transformer_utils link. This takes a layer x position x d_vocab tensor, and visualises it as a layer x position heatmap, printing the string value of the top token in each cell, and colouring by the top token value.

image

sheikheddy commented 1 year ago

I recommend http://circos.ca/intro/circular_approach/.

Python implementations: https://github.com/ponnhide/pyCircos or https://github.com/moshi4/pyCirclize

sheikheddy commented 1 year ago

Okay, I'm going to put down some rough thoughts:

Often you want to look at vectors over the vocabulary (eg the logits at a specific position). This is >50,000 dimensions and this is hard to interpret! I want there to be nice utils to visualize a vector like this.

A more explicit way to put it: Encoding name OpenAI models
gpt2 (or r50k_base) Most GPT-3 models (and GPT-2)
p50k_base Code models, text-davinci-002, text-davinci-003
cl100k_base text-embedding-ada-002

Let's start with this snippet from https://github.com/openai/tiktoken:

def gpt2():
    mergeable_ranks = data_gym_to_mergeable_bpe_ranks(
        vocab_bpe_file="https://openaipublic.blob.core.windows.net/gpt-2/encodings/main/vocab.bpe",
        encoder_json_file="https://openaipublic.blob.core.windows.net/gpt-2/encodings/main/encoder.json",
    )
    return {
        "name": "gpt2",
        "explicit_n_vocab": 50257,
        "pat_str": r"""'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+""",
        "mergeable_ranks": mergeable_ranks,
        "special_tokens": {"<|endoftext|>": 50256},
    }

For clarity, here are a few assumptions:

Here's a few ideas:

All of this sounds a bit overkill for a helper function, but if fully realized, I think it'd be a really neat tool.

sheikheddy commented 1 year ago

I'll try to put a prototype up this weekend

neelnanda-io commented 1 year ago

Thanks! I'll admit that those takes were too in depth for me to really get my head around them, but it sounded interesting and I would love to see a prototype

On Thu, 2 Mar 2023, 14:14 sheikheddy, @.***> wrote:

I'll try to put a prototype up this weekend

— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/112#issuecomment-1451934017, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKMHOVPD3QBVLYCGAQLW2CTLTANCNFSM6AAAAAATDNKYOQ . You are receiving this because you authored the thread.Message ID: @.***>

sheikheddy commented 1 year ago

Still working on this, have some links in the meantime

https://observablehq.com/@bstaats/graph-visualization-introduction https://observablehq.com/@observablehq/why-use-a-radial-data-visualization https://observablehq.com/@kerryrodden/equal-area-radial-matrix-of-lgbt-rights https://observablehq.com/@mbostock/polar-clock

sheikheddy commented 1 year ago

Seems like this would be a contribution to https://github.com/alan-cooney/CircuitsVis/blob/main/python/circuitsvis/logits.py, not TransformerLens?

neelnanda-io commented 1 year ago

Ah, yes, if you're imagining a real interactive visualisation, putting it in CircuitsVis seems more natural. It's set up to be easy to integrate Javascript code and Python.

On Tue, 7 Mar 2023 at 15:05, sheikheddy @.***> wrote:

Seems like this would be a contribution to https://github.com/alan-cooney/CircuitsVis/blob/main/python/circuitsvis/logits.py, not TransformerLens?

— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/112#issuecomment-1458331718, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKM76Q6E4OHCIHJN353W25FEZANCNFSM6AAAAAATDNKYOQ . You are receiving this because you authored the thread.Message ID: @.***>

jbloomAus commented 1 year ago

@sheikheddy @neelnanda-io What's the plan here? Do we need an interactive visualization or will something else do?

abdurraheemali commented 1 year ago

https://www.brendangregg.com/blog/2017-02-06/flamegraphs-vs-treemaps-vs-sunburst.html for a non-interactive visualization, flame graphs do pretty well

(I'm @sheikheddy from an alt-account)