[Bug Report] Fix `n_params` counts

ArthurConmy commented 1 year ago

Describe the bug The n_params counts calculated here are wrong. For example, LLAMA uses SwiGLU so the 2x factor in the linked code is wrong. Further this just ignores bias parameters I think?

Code example

# Load in Llama-7B

llama.cfg.n_params # 5033164800 ...

System Info N/A

Additional context N/A

Checklist

[ x ] I have checked that there is no similar issue in the repo (required)

ArthurConmy commented 1 year ago

Unclear what the solution should be.

There are plausibly three different parameter counts that are helpful:

Parameters in training
Parameters ignoring embeddings
Parameters used now (e.g folding layer norm deletes some parameters)

I would appreciate people stating which parameter counts are most helpful to them

neelnanda-io commented 1 year ago

IMO this should be just total parameters for simplicity and alignment with the Pythia suite. Who cares about LayerNorm

On Tue, 14 Nov 2023, 8:09 pm ArthurConmy, @.***> wrote:

Unclear what the solution should be.

There are plausibly three different parameter counts that are helpful:

Parameters in training

Parameters ignoring embeddings

Parameters used now (e.g folding layer norm deletes some parameters)

— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/448#issuecomment-1811155350, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKNC7SNW2N2YHGVXGDDYEPFXFAVCNFSM6AAAAAA7LMEBR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJRGE2TKMZVGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

TransformerLensOrg / TransformerLens

[Bug Report] Fix `n_params` counts #448

Checklist