We don't currently support this with the current chat templating. We should consider supporting this via a new doc_to_text_response_prefill (name TBD...) field in the config file, for portions of input that are post-pended after applying a chat template.
One downside is that we would have to make models that can't accept such prefilled responses error out or ignore the prefill when evaluating on a task that uses this, and also this would further complicate the construction of contexts. So somewhat a tough decision. But Llama-3 uses this for tasks such as evaluation on MBPP so it's worth considering
Models that are open-source and/or used via
local-completions
, as well as Claude, allow one to "prefill" the start of the assistant's response to a given input: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-responseWe don't currently support this with the current chat templating. We should consider supporting this via a new
doc_to_text_response_prefill
(name TBD...) field in the config file, for portions of input that are post-pended after applying a chat template.One downside is that we would have to make models that can't accept such prefilled responses error out or ignore the prefill when evaluating on a task that uses this, and also this would further complicate the construction of contexts. So somewhat a tough decision. But Llama-3 uses this for tasks such as evaluation on MBPP so it's worth considering