EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.9k stars 1.84k forks source link

Allow Prefilling Assistant Response w/ Chat Templates #2248

Open haileyschoelkopf opened 2 months ago

haileyschoelkopf commented 2 months ago

Models that are open-source and/or used via local-completions, as well as Claude, allow one to "prefill" the start of the assistant's response to a given input: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response

We don't currently support this with the current chat templating. We should consider supporting this via a new doc_to_text_response_prefill (name TBD...) field in the config file, for portions of input that are post-pended after applying a chat template.

One downside is that we would have to make models that can't accept such prefilled responses error out or ignore the prefill when evaluating on a task that uses this, and also this would further complicate the construction of contexts. So somewhat a tough decision. But Llama-3 uses this for tasks such as evaluation on MBPP so it's worth considering

baberabb commented 2 months ago

They also use it for MMLU: 'gen_prefix': 'The best answer is '