Closed sohaibimran7 closed 4 months ago
The idea behind multi_scorer is that you call it from another scorer (it is itself not intended as a standalone scorer). One reason for this is that we can't know prima facia what the right metrics would be (i.e. is a correct/incorrect score or it is a continuous range which would call for mean()
. We will definitely add something to do the docs to clarify this.
Thanks for clarifying that. However, when I run something like:
@scorer(metrics=[accuracy(), bootstrap_std()])
def custom_multi_scorer(
scorers=[
match("end"),
answer("word"),
model_graded_fact(),
],
reducer=majority_vote
):
return multi_scorer(scorers, reducer)
@task
def fn_name():
return Task(
dataset=example_dataset("theory_of_mind"),
plan=[
chain_of_thought(),
generate(),
],
scorer=custom_multi_scorer,
)
I get:
Traceback (most recent call last)
/opt/anaconda3/envs/llm-awareness/lib/python3.10/site-packages/inspect_ai/_eval/task/run.py:165 in task_run
/opt/anaconda3/envs/llm-awareness/lib/python3.10/site-packages/inspect_ai/_eval/task/run.py:266 in task_run_sample
TypeError: object function can't be used in 'await' expression
I think you just need do to this: scorer=custom_multi_scorer()
(i.e. call the custom_multi_scorer()
function)
That does indeed work! Thanks
Quick fix:
Adding:
above line 19 in inspect_ai/scorer/_multi.py: