allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.79k stars 488 forks source link

Fix mmlu bpb bug only scoring answer=A questions #718

Closed OyvindTafjord closed 2 months ago

OyvindTafjord commented 2 months ago

Same problem as fixed for oe-eval tasks in https://github.com/allenai/OLMo/pull/712, I forgot there was separate handling for MMLU.

With the previous code, only questions with gold answer A would be counted in the bpb evaluations, now they should all be counted.

liujch1998 commented 2 months ago

Thanks Oyvind! Approving this PR.

Though it seems to me that not resetting label to 0 is fine for MMLU — MMLU’s prep_examples(), which is inherited from ICLMultiChoiceTaskDataset, does not skip cases where label_id and cont_id mismatch when metric=bpb (whereas OEEvalTask does), and ICLMetric.compute() does not use label_id when metric=bpb. So questions with any label_id should have already been included in the metric computation.