UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://UKGovernmentBEIS.github.io/inspect_ai/
MIT License
385 stars 41 forks source link

multi epoch runs do not report per epoch metrics #26

Open sohaibimran7 opened 1 month ago

sohaibimran7 commented 1 month ago

The ability to retrieve per-epoch scores for all metrics would be helpful, for eg. to calculate metric variance across epochs. Is there a way to retrieve or easily calculate per-epoch metrics?

aisi-inspect commented 1 month ago

There isn't currently a ready made way to do this (but I agree there should be!). In the meantime you could read the samples directly from the log file (https://ukgovernmentbeis.github.io/inspect_ai/eval-logs.html#evallog). The samples have an epoch field which you could use to compute manually.