I am inquiring about the model configuration outlined in your technical report.
In the technical report regarding 'Gemma', the 7B model specifies 'd_model' as 3072 in table 1.
I understand 'd_model' to represent the 'hidden size', which should be equivalent to 'Num heads Head size'.
I was confused because 'Num heads Head size' equals 4096, while 'd_model' is listed as 3072.
Could you clarify the meaning of 'd_model' and provide the correct 'hidden size' for the Gemma 7B model?
Hello,
I am inquiring about the model configuration outlined in your technical report.
In the technical report regarding 'Gemma', the 7B model specifies 'd_model' as 3072 in table 1.
I understand 'd_model' to represent the 'hidden size', which should be equivalent to 'Num heads Head size'. I was confused because 'Num heads Head size' equals 4096, while 'd_model' is listed as 3072. Could you clarify the meaning of 'd_model' and provide the correct 'hidden size' for the Gemma 7B model?
Thank you.