Closed zillion-zhao closed 1 week ago
You can try follow their setup and put the issue there: https://github.com/allenai/open-instruct?tab=readme-ov-file#setup
Else you can eval these tasks in the lm eval harness but results may slightly differ: https://github.com/EleutherAI/lm-evaluation-harness
Thank you for your reply! I will try them later.
Hello.
When I execute the generative evaluation, various types of exceptions occur.
For example, I cannot install auto-gptq and vllm and the error information is very hard to understand. (Maybe there are many conflicts in the requirements.txt in the open-instruct).
I just ignored these issues because it is factually so difficult to understand and I cannot find any information from the Internet. After that, when I execute the bash file bash scripts/generative_eval.sh, it also fails with a lot of error information.
My question is, could you please provide more detailed generation evaluation? Only for MMLU/GSM8K/BBH/TyDi QA is enough. Or, is there any other approaches to evaluate using these datasets?