Gemma: A study of the effect of the new issues

Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

https://lightning.ai

Apache License 2.0

8.25k stars 843 forks source link

Gemma: A study of the effect of the new issues #1031

Open Andrei-Aksionov opened 4 months ago

Andrei-Aksionov commented 4 months ago

Daniel Han in his blogpost shared his discoveries of what is ~also~ wrong with the Gemma implementation. Some of them are only appliable to Keras, some of them affects PyTorch too. This issue will include information concerning those bugs and ours codebase.

[ ] 1. Must add \<bos>
[ ] 2. There’s a typo for \<end_of_turn>model
[x] 3. $sqrt(3072)=55.4256$ but bfloat16 is $55.5$
[ ] 4. Layernorm (w+1) must be in float32
[ ] 5. Keras mixed_bfloat16 RoPE is wrong
[ ] 6. RoPE is sensitive to $y \cdot \left(\frac{1}{x}\right)$ vs $y/x$
[ ] 7. RoPE should be float32
[x] 8. GELU should be approx tanh not exact
[ ] 9. Additional: check why tests for LoRA and AdapterV2 variants of GemmaMLP fail

Feel free to chime in.

Thanks to @kashif for letting us know.

Andrei-Aksionov commented 4 months ago

Issue # 3

We don't downcast the scale value and keep it in torch.float32. So this one doesn't affect us. https://github.com/Lightning-AI/lit-gpt/blob/f241d94df59d82b2017bfdcd3800ac8779eb45f5/lit_gpt/model.py#L90-L91

Andrei-Aksionov commented 4 months ago

Issue # 8

Was fixed in #1004

awaelchli commented 4 months ago

Issue 2 is ok I think: https://github.com/Lightning-AI/litgpt/blob/f241d94df59d82b2017bfdcd3800ac8779eb45f5/chat/base.py#L369 We also add the new line.

awaelchli commented 4 months ago

Issue 1 is also ok I think. LitGPT's tokenizer adds bos if the tokenizer config has this: https://huggingface.co/google/gemma-2b-it/blob/718cb189da9c5b2e55abe86f2eeffee9b4ae0dad/tokenizer_config.json#L2

Andrei-Aksionov commented 4 months ago

Issue 2 is ok I think:

Yep. Maybe it's a type in the technical report

since here model's part ends with <end_of_turn>/model, while the prompt should be structured in this way:

<start_of_turn>user
{prompt}<end_of_turn>
<start_of_turn>model
{model prompt}<end_of_turn>

meaning that right after <end_of_turn> should be only <start_of_turn.

Anyway, our prompt doesn't have this issue and this is all what we can control.

Andrei-Aksionov commented 4 months ago

Issue 1 is also ok I think. LitGPT's tokenizer adds bos if the tokenizer config has this: https://huggingface.co/google/gemma-2b-it/blob/718cb189da9c5b2e55abe86f2eeffee9b4ae0dad/tokenizer_config.json#L2

Yep, the config has "add_bos_token": true and in that case our tokenizer adds it: https://github.com/Lightning-AI/litgpt/blob/f241d94df59d82b2017bfdcd3800ac8779eb45f5/lit_gpt/tokenizer.py#L96-L100

awaelchli commented 4 months ago

Yes that's what the blog you linked says, that there is a typo in the technical report. So we should not do what is in the technical report of Google. That's my understanding.