declare-lab / instruct-eval

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
https://declare-lab.github.io/instruct-eval/
Apache License 2.0
528 stars 42 forks source link

Add config to save eval results #12

Open arthurtobler opened 1 year ago

arthurtobler commented 1 year ago

Thanks for this neat repo, very convenient to evaluate LLM!

As a feature request, I would like to suggest adding an option to save results of an evaluation for the implemented tasks to allow for easier analytics. My understanding is that the current main.py only print results. It could be useful to store scores per sub-task as well for tasks like MMLU or BBH