DLLXW / baby-llama2-chinese

用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
MIT License
2.42k stars 296 forks source link

请教下参数大小如何计算 #5

Closed CanvaChen closed 1 year ago

CanvaChen commented 1 year ago

README.md中提到50M参数的配置为: max_seq_len = 512 dim = 512 n_layers = 8 n_heads = 8

请问如何计算参数大小?

DLLXW commented 1 year ago

README.md中提到50M参数的配置为: max_seq_len = 512 dim = 512 n_layers = 8 n_heads = 8

请问如何计算参数大小?

model.py里面有计算param这部分哈,仔细看下。