TIGER-AI-Lab MMLU-Pro issues

TIGER-AI-Lab / MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]

Apache License 2.0

133 stars 22 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Add gpt-4o-2024-11-20

#49 EwoutH opened 5 days ago
0
Why not use ChatCompletion instead of Completion?

#48 NagisaZj opened 1 week ago
0
Add Gemini-Exp-1114 and Gemini-Exp-1121

#47 EwoutH opened 1 week ago
0
Add Mistral-Large-Instruct-2411

#46 EwoutH opened 1 week ago
0
Model requests - Gemini 2 27b, Claude 3 Haiku, Mixtral 8x22b

#45 johns2s closed 1 week ago
1
Hunyuan Large & Athene V2

#44 NSbuilder closed 1 day ago
6
Add Tencent Hunyuan-Large

#43 EwoutH opened 2 weeks ago
2
Add Claude 3.5 Haiku

#42 EwoutH closed 1 week ago
2
New Model | meta-llama/Llama-3.1-405B-Instruct

#41 agm-eratosth opened 3 weeks ago
3
New Model | mistralai/Mistral-Large-Instruct-2407

#40 agm-eratosth closed 1 week ago
1
Which DeepSeek-Coder-V2?

#39 billbradley closed 3 weeks ago
1
Add SmolLM2 1.7B

#38 EwoutH closed 3 weeks ago
1
New model | Cohere Aya Expanse

#37 NSbuilder closed 3 weeks ago
1
New model | Yi - Lightning

#36 NSbuilder closed 1 month ago
1
Add Mistral Small v24.09

#35 EwoutH closed 1 month ago
2
Add Ministral 3B and 8B

#34 EwoutH closed 1 month ago
1
Iask api

#33 mujtabaasif closed 1 month ago
0
add support for iask-api

#32 mujtabaasif closed 1 month ago
0
What is the Arx-0.3 model?

#31 DenisSergeevitch closed 1 month ago
1
Llama-3.1-nemotron-70b-instruct

#30 NSbuilder closed 1 month ago
2
CUDA error: no kernel image is available for execution on the device

#29 jakethesnake1126 closed 3 weeks ago
0
Added support for Gemini 1.5 Flash 8b.

#28 dynamicwebpaige closed 1 month ago
1
Update requirements to include google and anthropic

#27 LoopControl closed 1 month ago
0
regarding leaderboard submission

#26 sorobedio closed 1 month ago
1
Add Gemini-1.5-Flash-002 and -Pro-002

#25 EwoutH closed 1 month ago
2
Paper claims there are 10-choices but the test split has varying number of choices (anywhere from 3 to 10)

#24 eldarkurtic closed 3 weeks ago
6
Suggested minimum context length requirement?

#23 ubergarm closed 2 months ago
2
Add Qwen2.5 model family

#22 EwoutH closed 2 months ago
4
OpenAI o1-preview and o1-mini

#21 EwoutH opened 2 months ago
3
Create SECURITY.md

#20 tech-jun-jones closed 2 months ago
1
Create generator-generic-ossf-slsa3-publish.yml

#19 tech-jun-jones closed 2 months ago
0
where is global_record_file="eval_results/eval_record_collection.csv"？

#18 lianshan01 closed 2 months ago
1
eval_results do not contain the actual answer, right?

#17 emanuelevivoli closed 2 months ago
2
Questionable questions

#16 billbradley closed 2 months ago
1
Why dont use chat template for chat model?

#15 eyuansu62 closed 2 months ago
1
Variable length of "options"?

#14 billbradley closed 2 months ago
1
Possible to remove spam model result

#13 mrconter1 closed 3 months ago
1
Add Grok-2?

#12 mrconter1 closed 3 months ago
2
Support for standard deviation

#11 RodriMora closed 2 months ago
1
Request for Llama3.1 8B, 70B and 405B

#10 RodriMora closed 3 months ago
5
Potential coding errors in `evaluate_from_api.py`

#9 sudanl closed 4 months ago
1
Updated the regex pattern in extract_final to use [A-J] between word boundries

#8 chigkim closed 4 months ago
0
Regex pattern in extract_final function.

#7 chigkim closed 4 months ago
11
Duplicates in test split

#6 Pupy101 closed 4 months ago
1
Different Setup for Different Models?

#5 chigkim closed 4 months ago
6
Add Gemma 2 9B and 27B

#4 carterprince closed 4 months ago
4
Changed a typo from "Let think" to "Let's think"

#3 chigkim closed 5 months ago
0
Update README.md

#2 eadst closed 5 months ago
0
Chat template for instruct models for local eval

#1 gnalbandyan closed 5 months ago
1