google-deepmind / gemma

Open weights LLM from Google DeepMind.
http://ai.google.dev/gemma
Apache License 2.0
2.41k stars 305 forks source link

gemma 7B model configuration #4

Closed llsj14 closed 7 months ago

llsj14 commented 7 months ago

Hello,

I am inquiring about the model configuration outlined in your technical report.

In the technical report regarding 'Gemma', the 7B model specifies 'd_model' as 3072 in table 1.

image

I understand 'd_model' to represent the 'hidden size', which should be equivalent to 'Num heads Head size'. I was confused because 'Num heads Head size' equals 4096, while 'd_model' is listed as 3072. Could you clarify the meaning of 'd_model' and provide the correct 'hidden size' for the Gemma 7B model?

Thank you.

llsj14 commented 7 months ago

I speculate that this model may have different values for 'qkv dimension * Head size' and d_model, as they do not necessarily need to be identical.