TIGER-AI-Lab / MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
Apache License 2.0
133 stars 22 forks source link

New Model | meta-llama/Llama-3.1-405B-Instruct #41

Open agm-eratosth opened 3 weeks ago

EwoutH commented 1 week ago

@Wyyyb @wenhuchen I would really love to have Llama-3.1-405B-Instruct on the leaderboard.

Model: https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct

They claim a MMLU-Pro score of 73.3%.

Thanks again for all you hard work maintaining this leaderboard!

Wyyyb commented 1 week ago

Testing Llama-3.1-405B-Instruct locally would be challenging due to the large memory requirements. The 405B parameter model size exceeds typical local testing capabilities.

EwoutH commented 1 week ago

Would using an API be feasible? There are quite some available: https://artificialanalysis.ai/models/llama-3-1-instruct-405b/providers