UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
627 stars 118 forks source link

location property for eval logs #872

Closed jjallaire closed 1 day ago

jjallaire commented 1 day ago

The EvalLog object returned from eval() and read_eval_log() now has a location property that indicates the storage location it was written to or read from.

The write_eval_log() function will use this location if it isn't passed an explicit location to write to. This enables you to modify the contents of a log file return from eval() as follows:

log = eval(my_task())[0]
# edit EvalLog as required
write_eval_log(log)

Or alternatively for an EvalLog read from a filesystem:

log = read_eval_log(log_file_path)
# edit EvalLog as required
write_eval_log(log)

If you are working with the results of an Eval Set, the returned logs are headers rather than the full log with all samples. If you want to edit logs returned from eval_set you should read them fully, edit them, and then write them. For example:

success, logs = eval_set(tasks)

for log in logs:
    log = read_eval_log(log.location)
    # edit EvalLog as required
    write_eval_log(log)

Note that the EvalLog.location is a URI rather than a traditional file path(e.g. it could be a file:// URI, an s3:// URI or any other URI supported by fsspec).