TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models
https://transformerlensorg.github.io/TransformerLens/
MIT License
1.57k stars 304 forks source link

[Bug Report] Fix `n_params` counts #448

Open ArthurConmy opened 1 year ago

ArthurConmy commented 1 year ago

Describe the bug The n_params counts calculated here are wrong. For example, LLAMA uses SwiGLU so the 2x factor in the linked code is wrong. Further this just ignores bias parameters I think?

Code example

# Load in Llama-7B

llama.cfg.n_params # 5033164800 ...

System Info N/A

Additional context N/A

Checklist

ArthurConmy commented 1 year ago

Unclear what the solution should be.

There are plausibly three different parameter counts that are helpful:

  1. Parameters in training
  2. Parameters ignoring embeddings
  3. Parameters used now (e.g folding layer norm deletes some parameters)

I would appreciate people stating which parameter counts are most helpful to them

neelnanda-io commented 1 year ago

IMO this should be just total parameters for simplicity and alignment with the Pythia suite. Who cares about LayerNorm

On Tue, 14 Nov 2023, 8:09 pm ArthurConmy, @.***> wrote:

Unclear what the solution should be.

There are plausibly three different parameter counts that are helpful:

  1. Parameters in training
  2. Parameters ignoring embeddings
  3. Parameters used now (e.g folding layer norm deletes some parameters)

— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/448#issuecomment-1811155350, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKNC7SNW2N2YHGVXGDDYEPFXFAVCNFSM6AAAAAA7LMEBR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJRGE2TKMZVGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>