Closed sglucas closed 11 months ago
Which three are we missing?
Which three are we missing?
Hi I noticed a discrepancy in the file names and found that 6 tasks are missing. boolean_expressions、multistep_arithmetic_two、object_counting、penguins_in_a_table、web_of_lies、word_sorting https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main/bbh
@StellaAthena also it seems the number of examples doesn't match. Most datasets listed at original BBH repo (BIG-Bench-Hard) have 250 data points, but, for example, current dyck_languages has 1000 examples. One possible reason is the current resources are from bigbench instead of the BBH.
Hi! In the big-refactor
branch (soon to be next major version release) we support BBH as implemented in the BBH paper, such as 3-shot CoT with the 250 subselected examples and matching their prompt: https://github.com/EleutherAI/lm-evaluation-harness/tree/big-refactor/lm_eval/tasks/bbh/flan_cot_fewshot
Hi
I fing the official bbh contrains 23 tasks and I just find 20 tasks in this repo. Do you plan to add more tasks in this repo?