Open shenxiangzhuang opened 8 months ago
from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
Parameter count(sum(p.numel() for p in model.parameters())
): 124439808
GPT2Model(
(wte): Embedding(50257, 768)
(wpe): Embedding(1024, 768)
(drop): Dropout(p=0.1, inplace=False)
(h): ModuleList(
(0-11): 12 x GPT2Block(
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
wte
: 50257 * 768 = 38597376wpe
: 1024 * 768 = 786432drop
: 0GPTBlock
: 7087872ln_f
: 768 * 2 = 1536model
= wte
+ wpe
+ drop
+ 12 GPTBlock
+ ln_f
= 38597376 + 786432 + 0 + 12 7087872 + 1536 = 124439808
GPT2Block(
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
ln_1
: 768 * 2 = 1536attn
: 2362368ln_2
: 768 * 2 = 1536mlp
: 4722432GPTBlock
= ln_1
+ attn
+ ln_2
+ mlp
= 1536 + 2362368 + 1536 + 4722432 = 7087872
Conv1D
: codeGPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
c_attn
: 3 768 768 + 3 * 768 = 1771776
self.c_attn = Conv1D(3 * self.embed_dim, self.embed_dim)
c_proj
: 768 * 768 + 768 = 590592attn_dropout
: 0resid_dropout
: 0GPT2Attention
= c_attn
+ c_proj
+ attn_dropout
+ resid_dropout
= 1771776 + 590592 = 2362368
GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
c_fc
: 4 768 768 + 4 * 768 = 2362368
self.c_fc = Conv1D(intermediate_size, embed_dim)
c_proj
: 768 4 768 + 768 = 2360064
self.c_proj = Conv1D(embed_dim, intermediate_size)
act
: 0dropout
: 0GPT2MLP
= c_fc
+ c_proj
+ act
+ dropout
= 2362368 + 2360064 = 4722432
HuggingFace Model Parameter Viz by Sankey plot?