Open sohaibimran7 opened 1 month ago
There isn't currently a ready made way to do this (but I agree there should be!). In the meantime you could read the samples directly from the log file (https://ukgovernmentbeis.github.io/inspect_ai/eval-logs.html#evallog). The samples have an epoch
field which you could use to compute manually.
The ability to retrieve per-epoch scores for all metrics would be helpful, for eg. to calculate metric variance across epochs. Is there a way to retrieve or easily calculate per-epoch metrics?