UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://UKGovernmentBEIS.github.io/inspect_ai/
MIT License
385 stars 41 forks source link

CoT isn't saved in logs for multiple choice questions #63

Open js-d opened 6 days ago

js-d commented 6 days ago

In the GPQA Example, I think multiple_choice(cot=cot, shuffle=True) doesn't work. From looking at the source code, the cot argument doesn't seem to be implemented?

If I instead use multiple_choice(template=MULTIPLE_CHOICE_TEMPLATE_COT) as in the MMLU Example, I can't find the model chain of thought in the logs (I can only see which of the multiple choices the model picked).