UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
565 stars 96 forks source link

Using multi_scorer raises AttributeError #23

Closed sohaibimran7 closed 4 months ago

sohaibimran7 commented 4 months ago
Traceback (most recent call last)
/opt/anaconda3/envs/<my_env>/lib/python3.10/site-packages/inspect_ai/_eval/task/run.py:182 in task_run
/opt/anaconda3/envs/<my_env>/lib/python3.10/site-packages/inspect_ai/_eval/task/results.py: 27 in eval_results                                                                               
/opt/anaconda3/envs/<my_env>/lib/python3.10/site-packages/inspect_ai/_util/registry.py:179  in registry_info                                                                               

AttributeError: 'function' object has no attribute '__registry_info__'

Quick fix:

Adding:

@scorer(metrics=[accuracy(), bootstrap_std()])

above line 19 in inspect_ai/scorer/_multi.py:

def multi_scorer(scorers: list[Scorer], reducer: ScoreReducer) -> Scorer:
aisi-inspect commented 4 months ago

The idea behind multi_scorer is that you call it from another scorer (it is itself not intended as a standalone scorer). One reason for this is that we can't know prima facia what the right metrics would be (i.e. is a correct/incorrect score or it is a continuous range which would call for mean(). We will definitely add something to do the docs to clarify this.

sohaibimran7 commented 4 months ago

Thanks for clarifying that. However, when I run something like:

@scorer(metrics=[accuracy(), bootstrap_std()])
def custom_multi_scorer(
    scorers=[
    match("end"),
    answer("word"),
    model_graded_fact(),
    ], 
    reducer=majority_vote
  ):
  return multi_scorer(scorers, reducer)

@task
def fn_name():
    return Task(
        dataset=example_dataset("theory_of_mind"),
        plan=[
          chain_of_thought(),
          generate(),
        ],
        scorer=custom_multi_scorer,
    )

I get:

Traceback (most recent call last)
/opt/anaconda3/envs/llm-awareness/lib/python3.10/site-packages/inspect_ai/_eval/task/run.py:165 in task_run                                                                                      
/opt/anaconda3/envs/llm-awareness/lib/python3.10/site-packages/inspect_ai/_eval/task/run.py:266 in task_run_sample                                                                               
TypeError: object function can't be used in 'await' expression
aisi-inspect commented 4 months ago

I think you just need do to this: scorer=custom_multi_scorer() (i.e. call the custom_multi_scorer() function)

sohaibimran7 commented 4 months ago

That does indeed work! Thanks