ai-glimpse / toyllm

Toy LLM
https://ai-glimpse.github.io/toyllm/
Apache License 2.0
1 stars 0 forks source link

LLM: Model Parameter counting #39

Open shenxiangzhuang opened 7 months ago

shenxiangzhuang commented 7 months ago
shenxiangzhuang commented 7 months ago

GPT2

Model

from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')

Parameter count(sum(p.numel() for p in model.parameters())): 124439808

GPT2Model(
  (wte): Embedding(50257, 768)
  (wpe): Embedding(1024, 768)
  (drop): Dropout(p=0.1, inplace=False)
  (h): ModuleList(
    (0-11): 12 x GPT2Block(
      (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (attn): GPT2Attention(
        (c_attn): Conv1D()
        (c_proj): Conv1D()
        (attn_dropout): Dropout(p=0.1, inplace=False)
        (resid_dropout): Dropout(p=0.1, inplace=False)
      )
      (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (mlp): GPT2MLP(
        (c_fc): Conv1D()
        (c_proj): Conv1D()
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)

Parameter count

Model

model = wte + wpe + drop + 12 GPTBlock + ln_f = 38597376 + 786432 + 0 + 12 7087872 + 1536 = 124439808

GPTBlock

GPT2Block(
  (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  (attn): GPT2Attention(
    (c_attn): Conv1D()
    (c_proj): Conv1D()
    (attn_dropout): Dropout(p=0.1, inplace=False)
    (resid_dropout): Dropout(p=0.1, inplace=False)
  )
  (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  (mlp): GPT2MLP(
    (c_fc): Conv1D()
    (c_proj): Conv1D()
    (act): NewGELUActivation()
    (dropout): Dropout(p=0.1, inplace=False)
  )
)

GPTBlock = ln_1 + attn + ln_2 + mlp = 1536 + 2362368 + 1536 + 4722432 = 7087872

Attention

GPT2Attention(
  (c_attn): Conv1D()
  (c_proj): Conv1D()
  (attn_dropout): Dropout(p=0.1, inplace=False)
  (resid_dropout): Dropout(p=0.1, inplace=False)
)

GPT2Attention = c_attn + c_proj + attn_dropout + resid_dropout = 1771776 + 590592 = 2362368

MLP

GPT2MLP(
  (c_fc): Conv1D()
  (c_proj): Conv1D()
  (act): NewGELUActivation()
  (dropout): Dropout(p=0.1, inplace=False)
)

GPT2MLP = c_fc + c_proj + act + dropout = 2362368 + 2360064 = 4722432

shenxiangzhuang commented 7 months ago

HuggingFace Model Parameter Viz by Sankey plot?