issues
search
TIGER-AI-Lab
/
MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
Apache License 2.0
133
stars
22
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add gpt-4o-2024-11-20
#49
EwoutH
opened
5 days ago
0
Why not use ChatCompletion instead of Completion?
#48
NagisaZj
opened
1 week ago
0
Add Gemini-Exp-1114 and Gemini-Exp-1121
#47
EwoutH
opened
1 week ago
0
Add Mistral-Large-Instruct-2411
#46
EwoutH
opened
1 week ago
0
Model requests - Gemini 2 27b, Claude 3 Haiku, Mixtral 8x22b
#45
johns2s
closed
1 week ago
1
Hunyuan Large & Athene V2
#44
NSbuilder
closed
1 day ago
6
Add Tencent Hunyuan-Large
#43
EwoutH
opened
2 weeks ago
2
Add Claude 3.5 Haiku
#42
EwoutH
closed
1 week ago
2
New Model | meta-llama/Llama-3.1-405B-Instruct
#41
agm-eratosth
opened
3 weeks ago
3
New Model | mistralai/Mistral-Large-Instruct-2407
#40
agm-eratosth
closed
1 week ago
1
Which DeepSeek-Coder-V2?
#39
billbradley
closed
3 weeks ago
1
Add SmolLM2 1.7B
#38
EwoutH
closed
3 weeks ago
1
New model | Cohere Aya Expanse
#37
NSbuilder
closed
3 weeks ago
1
New model | Yi - Lightning
#36
NSbuilder
closed
1 month ago
1
Add Mistral Small v24.09
#35
EwoutH
closed
1 month ago
2
Add Ministral 3B and 8B
#34
EwoutH
closed
1 month ago
1
Iask api
#33
mujtabaasif
closed
1 month ago
0
add support for iask-api
#32
mujtabaasif
closed
1 month ago
0
What is the Arx-0.3 model?
#31
DenisSergeevitch
closed
1 month ago
1
Llama-3.1-nemotron-70b-instruct
#30
NSbuilder
closed
1 month ago
2
CUDA error: no kernel image is available for execution on the device
#29
jakethesnake1126
closed
3 weeks ago
0
Added support for Gemini 1.5 Flash 8b.
#28
dynamicwebpaige
closed
1 month ago
1
Update requirements to include google and anthropic
#27
LoopControl
closed
1 month ago
0
regarding leaderboard submission
#26
sorobedio
closed
1 month ago
1
Add Gemini-1.5-Flash-002 and -Pro-002
#25
EwoutH
closed
1 month ago
2
Paper claims there are 10-choices but the test split has varying number of choices (anywhere from 3 to 10)
#24
eldarkurtic
closed
3 weeks ago
6
Suggested minimum context length requirement?
#23
ubergarm
closed
2 months ago
2
Add Qwen2.5 model family
#22
EwoutH
closed
2 months ago
4
OpenAI o1-preview and o1-mini
#21
EwoutH
opened
2 months ago
3
Create SECURITY.md
#20
tech-jun-jones
closed
2 months ago
1
Create generator-generic-ossf-slsa3-publish.yml
#19
tech-jun-jones
closed
2 months ago
0
where is global_record_file="eval_results/eval_record_collection.csv"?
#18
lianshan01
closed
2 months ago
1
eval_results do not contain the actual answer, right?
#17
emanuelevivoli
closed
2 months ago
2
Questionable questions
#16
billbradley
closed
2 months ago
1
Why dont use chat template for chat model?
#15
eyuansu62
closed
2 months ago
1
Variable length of "options"?
#14
billbradley
closed
2 months ago
1
Possible to remove spam model result
#13
mrconter1
closed
3 months ago
1
Add Grok-2?
#12
mrconter1
closed
3 months ago
2
Support for standard deviation
#11
RodriMora
closed
2 months ago
1
Request for Llama3.1 8B, 70B and 405B
#10
RodriMora
closed
3 months ago
5
Potential coding errors in `evaluate_from_api.py`
#9
sudanl
closed
4 months ago
1
Updated the regex pattern in extract_final to use [A-J] between word boundries
#8
chigkim
closed
4 months ago
0
Regex pattern in extract_final function.
#7
chigkim
closed
4 months ago
11
Duplicates in test split
#6
Pupy101
closed
4 months ago
1
Different Setup for Different Models?
#5
chigkim
closed
4 months ago
6
Add Gemma 2 9B and 27B
#4
carterprince
closed
4 months ago
4
Changed a typo from "Let think" to "Let's think"
#3
chigkim
closed
5 months ago
0
Update README.md
#2
eadst
closed
5 months ago
0
Chat template for instruct models for local eval
#1
gnalbandyan
closed
5 months ago
1