Closed gnalbandyan closed 5 months ago
This is a great question. Some chat models are fine-tuned with a chat template during alignment, so the model sees interactions like:
User: I have an instruction now, here's an example: input A, output A Assistant: OK User: Here's another example, input B, output B Assistant: OK User: input C, what's the output of input C?
In our few-shot setup, we include the examples directly in the input text. The reason we do this is to keep consistency between chat models, base models, and instruct models that do not support multi-turn dialogue, making it easier to compare them. However, we have not conducted experiments to explore the differences in final metrics with and without the chat template. We leave this to future work.
Hi, thanks for open sourcing the dataset. Here in evaluate_from_local.py few shot prompt is created as a single string and is directly tokenized. But HF tokenizer has a chat_template as demonstrated in LLama3-70B HF Readme , where we can use system->user->system type chat to create the few shot prompt. Is there any reason why this is not used? Do you know the difference in final metric with and without the chap template? Thanks.