Generative evaluation failed due to various error

zillion-zhao commented 3 weeks ago

Hello.

When I execute the generative evaluation, various types of exceptions occur.

For example, I cannot install auto-gptq and vllm and the error information is very hard to understand. (Maybe there are many conflicts in the requirements.txt in the open-instruct).

I just ignored these issues because it is factually so difficult to understand and I cannot find any information from the Internet. After that, when I execute the bash file bash scripts/generative_eval.sh, it also fails with a lot of error information.

My question is, could you please provide more detailed generation evaluation? Only for MMLU/GSM8K/BBH/TyDi QA is enough. Or, is there any other approaches to evaluate using these datasets?

Muennighoff commented 3 weeks ago

You can try follow their setup and put the issue there: https://github.com/allenai/open-instruct?tab=readme-ov-file#setup

Else you can eval these tasks in the lm eval harness but results may slightly differ: https://github.com/EleutherAI/lm-evaluation-harness

zillion-zhao commented 3 weeks ago

Thank you for your reply! I will try them later.

ContextualAI / gritlm

Generative evaluation failed due to various error #40