Open Andrei-Aksionov opened 4 months ago
We don't downcast the scale value and keep it in torch.float32
. So this one doesn't affect us.
https://github.com/Lightning-AI/lit-gpt/blob/f241d94df59d82b2017bfdcd3800ac8779eb45f5/lit_gpt/model.py#L90-L91
Was fixed in #1004
Issue 2 is ok I think: https://github.com/Lightning-AI/litgpt/blob/f241d94df59d82b2017bfdcd3800ac8779eb45f5/chat/base.py#L369 We also add the new line.
Issue 1 is also ok I think. LitGPT's tokenizer adds bos if the tokenizer config has this: https://huggingface.co/google/gemma-2b-it/blob/718cb189da9c5b2e55abe86f2eeffee9b4ae0dad/tokenizer_config.json#L2
Issue 2 is ok I think:
Yep. Maybe it's a type in the technical report
since here model's part ends with <end_of_turn>/model
, while the prompt should be structured in this way:
<start_of_turn>user
{prompt}<end_of_turn>
<start_of_turn>model
{model prompt}<end_of_turn>
meaning that right after <end_of_turn>
should be only <start_of_turn
.
Anyway, our prompt doesn't have this issue and this is all what we can control.
Issue 1 is also ok I think. LitGPT's tokenizer adds bos if the tokenizer config has this: https://huggingface.co/google/gemma-2b-it/blob/718cb189da9c5b2e55abe86f2eeffee9b4ae0dad/tokenizer_config.json#L2
Yep, the config has "add_bos_token": true
and in that case our tokenizer adds it:
https://github.com/Lightning-AI/litgpt/blob/f241d94df59d82b2017bfdcd3800ac8779eb45f5/lit_gpt/tokenizer.py#L96-L100
Yes that's what the blog you linked says, that there is a typo in the technical report. So we should not do what is in the technical report of Google. That's my understanding.
Daniel Han in his blogpost shared his discoveries of what is ~also~ wrong with the Gemma implementation. Some of them are only appliable to Keras, some of them affects PyTorch too. This issue will include information concerning those bugs and ours codebase.
Feel free to chime in.
Thanks to @kashif for letting us know.