Closed hynky1999 closed 4 months ago
Pro jistotu jsem nakonec klokana vybalanoval (Náhodně permutoval odpovědi a upravil správnou)
gpt2-tokenizeru
[('biology', 126),
('chemistry', 118),
('czech', 139),
('history', 135),
('informatics', 147),
('math', 114),
('physics', 125)]
[(0, 243), (1, 248), (2, 340), (3, 264), (4, 330), (5, 273)]
Díky!!!
Purpose
Why handle KlokanQA as multi-choice with all pos answers shown
Examples:
As can be seen, without the proposed solutions there are multiple correct solutions.
Why handle UmimetoQA as multi-choice with all pos answers shown
Examples:
For Umimeto non A/B MMLU style would also work, but I like this version works better, because this task has really bad assignments and the second possibility renders the context of the question better in my opinion.
Why we use logprobs instead of exact_match ?
I run several tests on Mixtral and for some questions it will not follow the expected format, thus rendering extraction unfeasible. This is even bigger problem if the LLMs are not Instruction/RLHF tuned and work only in completion mode. I had the same experience on my czeval benchmark with weak 7B models.y
Misc
The umimeto dataset is unreachable. The reasoning is simple it currently lives in my personal repository on hf in private mode. Since I don't have write perms to CZLC group I can't make repository there.