Open Saibo-creator opened 4 months ago
The problem seems to stem from the default padding configuration of the Llama Tokenizer, which is set to "left" padding instead of the more common "right" padding used by most large language model (LLM) tokenizers.
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.padding_side
#'right'
A straightforward solution is to adjust the padding side of the llama tokenizer by adding the line llama_tokenizer.padding_side = "right"
.
However, it's not yet clear which specific part of the code is affected by this setting. I plan to delve into this further. For now, the aforementioned fix is effective, and this issue seems to only impact llama models.
Note: The LLAMA-3 model already defaults the padding side to "right".
But it seems that left padding is the right way to go, otherwise we lose performance. https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side
We can discuss this further. @Yuxing0610
I am also running into this issue.
The Padding side on the Left is the right way to go for Batch processing of inputs ( sending multiple sequences at a time), as each input needs to be the same length. If we pad on the right, we will have a number of <|eot_id|> tokens following the assistant message, and the model will not generate anything.
Reproduce
Context
saibo/llama-1B
is a randomly initialized model for debugging purpose. Though it is not a trained LLM, it should be forced to generate some structure but it is failing.