EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.91k stars 1.84k forks source link

What are `mmlu_continuation` and `mmlu_generative`? #2255

Closed shizhediao closed 2 months ago

shizhediao commented 2 months ago

What are mmlu_continuation and mmlu_generative? Where can I find their description?

I am going to test mmlu in the cloze way. Like the following illustration: image

Thank you!

baberabb commented 2 months ago

Hi! mmlu_generative is a generation variant of mmlu (so can be used when log probs aren't available). mmlu_continuation is cloze style multiple-choice but it does not put the choices in the context. So for example the prompt will be <question> <answer_choice_a>, <question> <answer_choice_b>... (compared to default MMLU where it's <question> <choices> <A.>, <question> <choices> <B.>).
If you want the choices in context as well, I think the simplest way is probably replacing the doc_to_choice in the default config with {{choices}}

shizhediao commented 2 months ago

I see. It is quite clear. Thank you so much!

shizhediao commented 2 months ago

Hi, Just to confirm, is my following command correct?

dataset_path: hails/mmlu_no_train # a copy of `cais/mmlu` with no auxiliary_train split
test_split: test
fewshot_split: dev
fewshot_config:
  sampler: first_n
output_type: multiple_choice
doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:"
doc_to_choice: {{choices}}
doc_to_target: answer
metric_list:
  - metric: acc
    aggregation: mean
    higher_is_better: true
metadata:
  version: 1.0
dataset_kwargs:
  trust_remote_code: true
haileyschoelkopf commented 2 months ago

Hi @shizhediao , this looks correct to me, although if you wanted the

"The following are multiple-choice questions (with answers) about ..." you would need to add that to the description config field!

shizhediao commented 2 months ago

Thank you!