aliyun / aicb

Other
145 stars 21 forks source link

Question about the hyper-parameters of Llama? #12

Closed mama0512 closed 3 weeks ago

mama0512 commented 1 month ago

In workload/Workload_spec_v1.1.csv, I found that ffn_hidden_sizes are different from those the Llama paper mentions as below shows:

image

However,in your file, I found:

image

I was confused, please give me some response.

zhouheyang-alibaba commented 1 month ago

AICB workload suite is based on the relevant descriptions of the llama3 405B model structure found in the publicly available paper from llama. The paper can be accessed at the following link: https://arxiv.org/pdf/2407.21783 20241104112717