EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
7.11k stars 1.91k forks source link

bbh_zeroshot fails during to a custom filter issue. #2422

Open shamanez opened 1 month ago

shamanez commented 1 month ago

I am trying to run this setup:

lm_eval --model vllm \
    --model_args pretrained="Qwen/Qwen2.5-0.5B-Instruct",tensor_parallel_size=2,dtype=auto,gpu_memory_utilization=0.8 \
    --tasks bbh_zeroshot_snarks \
    --batch_size auto \

2024-10-23:02:50:28,876 WARNING [registry.py:192] filter <class 'utils.MultiChoiceRegexFilter'> is not registered! rank0: Traceback (most recent call last): rank0: File "/home/ubuntu/miniconda3/envs/entropix/bin/lm_eval", line 8, in

rank0: File "/home/ubuntu//lm-evaluation-harness/lm_eval/main.py", line 383, in cli_evaluate rank0: results = evaluator.simple_evaluate( rank0: File "/home/ubuntu//lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper rank0: return fn(args, kwargs) rank0: File "/home/ubuntu//lm-evaluation-harness/lm_eval/evaluator.py", line 235, in simple_evaluate rank0: task_dict = get_task_dict(tasks, task_manager) rank0: File "/home/ubuntu//lm-evaluation-harness/lm_eval/tasks/init.py", line 620, in get_task_dict rank0: task_name_from_string_dict = task_manager.load_task_or_group( rank0: File "/home/ubuntu//lm-evaluation-harness/lm_eval/tasks/init.py", line 415, in load_task_or_group rank0: collections.ChainMap(map(self._load_individual_task_or_group, task_list)) rank0: File "/home/ubuntu//lm-evaluation-harness/lm_eval/tasks/init.py", line 314, in _load_individual_task_or_group rank0: return _load_task(task_config, task=name_or_config) rank0: File "/home/ubuntu//lm-evaluation-harness/lm_eval/tasks/init.py", line 280, in _load_task rank0: task_object = ConfigurableTask(config=config) rank0: File "/home/ubuntu//lm-evaluation-harness/lm_eval/api/task.py", line 833, in init rank0: filter_pipeline = build_filter_ensemble(filter_name, components) rank0: File "/home/ubuntu//lm-evaluation-harness/lm_eval/filters/init.py", line 22, in build_filter_ensemble rank0: f = partial(get_filter(function), **kwargs) rank0: TypeError: the first argument must be callable

But it gives me the following error.

I can see this file implements the the filter class MultiChoiceRegexFilte .

@jungwhank @haileyschoelkopf

philheller commented 5 days ago

Hi there, First, I want to say thank you for this project. It's awesome! 🚀 I am currently running into the same issue. I am surprised to see a custom code section for the function attribute in the zeroshot configs of the bbh-tasks. I have been reading through the docs multiple times, but as the docs specify here, embedded python code for custom processing is only available for limited arguments. The attributes of the task configs such as this one are not among the ones stated in the docs. I am not sure if I missed something here @haileyschoelkopf and @lintangsutawika.

Further, I wondered why the tasks' filters are treated so differently. If I am not mistaken, the filters are applied like this (example the snarks.yml task in each group respectively: group task_config included_template filter_option_in filter_option
bbh_zeroshot here here task_config take_first (in first pipeline); custom !function utils.MultiChoiceRegexFilter & take_first (in second pipeline)
bbh_fewshot here here none none
bbh_cot_zeroshot here here task_config custom !function utils.MultiChoiceRegexFilter & take_first (in first pipeline); regex & take_first (second pipeline)
bbh_cot_fewshot here here included_template regex & take_first (in one pipeline)

This makes the processing more opaque - at least to me.

Cheers!