Open eminorhan opened 7 hours ago
Hi! Should be fixed but the PR
Thank you very much for the quick response. I can confirm that the submitted PR fixes the issue for me.
Hmm, I'm getting awfully low scores though. Accuracy for llama-3.1-8b seems to be in the single digits, whereas Meta reports it to be 73.0. Are we sure that the config for this task is set up correctly? It's hard for me to verify this, since I'm very new to this library. Would you be able to check the performance of the model meta-llama/Llama-3.1-8B
for instance?
Hmm, I'm getting awfully low scores though. Accuracy for llama-3.1-8b seems to be in the single digits, whereas Meta reports it to be 73.0. Are we sure that the config for this task is set up correctly? It's hard for me to verify this, since I'm very new to this library. Would you be able to check the performance of the model
meta-llama/Llama-3.1-8B
for instance?
do you have a reference for this?
Yes, this post by Meta reports the performance of llama-3.1-8b on the zero-shot CoT MMLU to be 73.0: https://ai.meta.com/blog/meta-llama-3-1/ That is supposed to be the same eval as mmlu_flan_cot_zeroshot
, right?
I'm new to this library. I installed the library as instructed on the README page. Running:
results in the following error:
Same issue mentioned in several other issues: e.g. https://github.com/EleutherAI/lm-evaluation-harness/issues/1885, https://github.com/EleutherAI/lm-evaluation-harness/issues/2094, so it seems to be a long-standing issue.
I tried several previous releases, but getting a panoply of errors each time. Is there any way at all to evaluate models on
mmlu_flan
tasks (a particular previous commit that demonstrably works, a relatively straightforward patch, etc.)? These are pretty fundamental tasks in LLM evals. It seems a bit unfortunate that the library persistently fails on this set of tasks.