DCGM / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
0 stars 2 forks source link

First version for Klokan+Umimeto #4

Closed hynky1999 closed 4 months ago

hynky1999 commented 5 months ago

Purpose

Why handle KlokanQA as multi-choice with all pos answers shown

Examples:

Tři klokani váží dohromady 97 kg. Každý z nich má jinou hmotnost, kterou lze vyjádřit přirozeným číslem. Určete největší možnou hmotnost nejlehčího klokana. 1 kg | 30 kg | 31 kg | 32 kg | 33 kg

As can be seen, without the proposed solutions there are multiple correct solutions.

Why handle UmimetoQA as multi-choice with all pos answers shown

Examples:

math;Jednotky hmotnosti: ze života;5;40 g;rohlík;varná konvice
biology;Paprskoploutvé ryby;9;Hmyzem a drobnými bezobratlými živočichy se živí:;pstruh obecný;sumec velký

For Umimeto non A/B MMLU style would also work, but I like this version works better, because this task has really bad assignments and the second possibility renders the context of the question better in my opinion.

Why we use logprobs instead of exact_match ?

I run several tests on Mixtral and for some questions it will not follow the expected format, thus rendering extraction unfeasible. This is even bigger problem if the LLMs are not Instruction/RLHF tuned and work only in completion mode. I had the same experience on my czeval benchmark with weak 7B models.y

Misc

The umimeto dataset is unreachable. The reasoning is simple it currently lives in my personal repository on hf in private mode. Since I don't have write perms to CZLC group I can't make repository there.

hynky1999 commented 4 months ago

Pro jistotu jsem nakonec klokana vybalanoval (Náhodně permutoval odpovědi a upravil správnou)

Maximální délky promptů (bez description), s použitím gpt2-tokenizeru

Umimeto-qa:

image

[('biology', 126),
 ('chemistry', 118),
 ('czech', 139),
 ('history', 135),
 ('informatics', 147),
 ('math', 114),
 ('physics', 125)]

Klokan-qa

image

[(0, 243), (1, 248), (2, 340), (3, 264), (4, 330), (5, 273)]

Distribuce jednotlivých tříd

Umimeto-qa:

image

Klokan-qa

image

MFajcik commented 4 months ago

Díky!!!