Open AarushSah opened 1 week ago
There isn't currently a hook for something like this. If you could further delineate the specific use case we can consider what might work well. Note that we are likely to soon provide some ability to control task execution (have Task
include a run()
function you can override or comparable w/ functional hooks) which would probably fit the bill. We've also discussed introducing a results filter that lets you make arbitrary changes to the log file before its returned/written.
The goal is to provide more detailed speed metrics from timestamps and token counts available in the EvalLog. The run feature would be awesome - and being able to write the new metrics to the logfile would be awesome as well.
I wonder if some of this you could just do in a solver or scorer? Exactly which data structures are you wanting to access and compute on?
EvalLog.results
:
total_samples
: Total number of samples in the evaluationcompleted_samples
: Number of successfully completed samplesEvalLog.samples
: Contains individual sample data, where each sample has:
events
: List of events for each sample, containing:
event
: Type of event (e.g., 'sample_init', 'step', 'model')timestamp
: When the event occurredcall.response['usage']
contains:
prompt_time
: Time taken for prompt processingcompletion_time
: Time taken for completion generationtotal_time
: Total processing timeprompt_tokens
: Number of input tokenscompletion_tokens
: Number of output tokenstotal_tokens
: Total tokens processedEvalLog.eval
:
model
: Name of the model being evaluatedEvalLog.plan.config
:
seed
: Seed usedhey @jjallaire! just following up - is it possible to access all of the above from a scorer or a solver? All of this data is needed for what I'm computing.
Hi! Is there an easy way to run code on the output of a Task from within a Task declaration? Currently, I'm doing something along the lines of this:
but I'd like to be able to define some code to run on the output of the eval within the function that defines the task, so that
process_eval
runs even when I call themy_eval
with the CLI. Is there any native way I can do that?Thanks in advance!