TIGER-AI-Lab / MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
Apache License 2.0
134 stars 23 forks source link

Add Qwen2.5 model family #22

Closed EwoutH closed 2 months ago

EwoutH commented 2 months ago

Tracking issue for the Qwen2.5 model family. These models are SOTA in their sizes on many benchmarks, and most are released under an permissive Apache 2.0 license.

Models on HuggingFace: Qwen2.5 | Qwen2.5-Coder | Qwen2.5-Math.

There are a lot of model in total, but I would add (at least):

There are also instruct-tuned models available for most models.

ubergarm commented 2 months ago

Some benchmarks are showing up for various quants already using chigkim/Ollama-MMLU-Pro posted on this reddit sub r/LocalLLaMA thread.

Looks promising so far...

Wyyyb commented 2 months ago

Thank you for your suggestions. We have updated the evaluation results for the Qwen2.5 series models in the leaderboard.

ubergarm commented 2 months ago

Just to save future folks a few clicks, the leaderboard is here: https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro

EwoutH commented 2 months ago

We have updated the evaluation results for the Qwen2.5 series models in the leaderboard.

Awesome! I saw the results are self-reported, are you planning on validating (one or more of) the Qwen2.5 models?