EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.24k stars 1.65k forks source link

Evaluate Gemma with Chat Template #2069

Open pyf98 opened 1 month ago

pyf98 commented 1 month ago

Hi, I'm trying to evaluate gemma-it models from Hugging Face on MMLU. When I set --apply_chat_template --fewshot_as_multiturn, the tokenizer will raise an error below. This is because Gemma does not support system messages: https://huggingface.co/google/gemma-2b-it/blob/main/tokenizer_config.json#L1507

jinja2.exceptions.TemplateError: System role not supported

What is the best way to evaluate Gemma chat models? Should I use chat templating or not? Should I remove the description for each document or move the description to the first user turn instead of the system prompt? Thank you for any help!

haileyschoelkopf commented 1 month ago

Hi, this is blocked on #2058 , I'll get that merged ASAP!

pyf98 commented 1 month ago

Thanks @haileyschoelkopf for your reply! It seems that we will just remove the system instruction for Gemma-style models. I'm wondering if it hurts the performance of Gemma chat models? I guess many benchmarks provide a brief description at the beginning, but those will be removed.