bbh_zeroshot fails during to a custom filter issue.

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

MIT License

7.11k stars 1.91k forks source link

lm_eval --model vllm \ --model_args pretrained="Qwen/Qwen2.5-0.5B-Instruct",tensor_parallel_size=2,dtype=auto,gpu_memory_utilization=0.8 \ --tasks bbh_zeroshot_snarks \ --batch_size auto \

Hi there, First, I want to say thank you for this project. It's awesome! 🚀 I am currently running into the same issue. I am surprised to see a custom code section for the function attribute in the zeroshot configs of the bbh-tasks. I have been reading through the docs multiple times, but as the docs specify here, embedded python code for custom processing is only available for limited arguments. The attributes of the task configs such as this one are not among the ones stated in the docs. I am not sure if I missed something here @haileyschoelkopf and @lintangsutawika.

Further, I wondered why the tasks' filters are treated so differently. If I am not mistaken, the filters are applied like this (example the `snarks.yml` task in each group respectively:	group	task_config	included_template	filter_option_in
`bbh_zeroshot`	here	here	`task_config`	`take_first` (in first pipeline); custom `!function utils.MultiChoiceRegexFilter` & `take_first` (in second pipeline)
`bbh_fewshot`	here	here	`none`	`none`
`bbh_cot_zeroshot`	here	here	`task_config`	custom `!function utils.MultiChoiceRegexFilter` & `take_first` (in first pipeline); `regex` & `take_first` (second pipeline)
`bbh_cot_fewshot`	here	here	`included_template`	`regex` & `take_first` (in one pipeline)

This makes the processing more opaque - at least to me.

Cheers!

EleutherAI / lm-evaluation-harness

bbh_zeroshot fails during to a custom filter issue. #2422