In the GPQA Example, I think multiple_choice(cot=cot, shuffle=True) doesn't work. From looking at the source code, the cot argument doesn't seem to be implemented?
If I instead use multiple_choice(template=MULTIPLE_CHOICE_TEMPLATE_COT) as in the MMLU Example, I can't find the model chain of thought in the logs (I can only see which of the multiple choices the model picked).
In the GPQA Example, I think
multiple_choice(cot=cot, shuffle=True)
doesn't work. From looking at the source code, thecot
argument doesn't seem to be implemented?If I instead use
multiple_choice(template=MULTIPLE_CHOICE_TEMPLATE_COT)
as in the MMLU Example, I can't find the model chain of thought in the logs (I can only see which of the multiple choices the model picked).