Allow Prefilling Assistant Response w/ Chat Templates

Models that are open-source and/or used via local-completions, as well as Claude, allow one to "prefill" the start of the assistant's response to a given input: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response

We don't currently support this with the current chat templating. We should consider supporting this via a new doc_to_text_response_prefill (name TBD...) field in the config file, for portions of input that are post-pended after applying a chat template.

One downside is that we would have to make models that can't accept such prefilled responses error out or ignore the prefill when evaluating on a task that uses this, and also this would further complicate the construction of contexts. So somewhat a tough decision. But Llama-3 uses this for tasks such as evaluation on MBPP so it's worth considering

EleutherAI / lm-evaluation-harness

Allow Prefilling Assistant Response w/ Chat Templates #2248