UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
567 stars 98 forks source link

Migrate hellaswag benchmark to Inspect Evals #601

Closed dragonstyle closed 2 days ago

jjallaire-aisi commented 2 days ago

Reminder to pickup moved evals in examples.yaml as well

MSchmatzAISI commented 2 days ago

Quick question - why are these being moved into src/inspect_evals? Is the plan to move every eval there after some quality checks?

jjallaire-aisi commented 2 days ago

inspect_evals is a proper Python module (so easier to have dependencies for than standalone source files). Further, the evals can now be run by just referencing their name rather than their filesystem path:

inspect eval inspect_evals/drop 

It's also easier for other Python code to import and use the tasks when they are in a proper module.

The plan is that everything goes into inspect_evals (there will be no more evals)