How to use a OS/local Model for LLM as Judge Response Evaluation

Psycoy / MixEval

The official evaluation suite and dynamic data release for MixEval.

https://mixeval.github.io/

224 stars 34 forks source link

How to use a OS/local Model for LLM as Judge Response Evaluation #45

Open carstendraschner opened 1 month ago

carstendraschner commented 1 month ago

HI @Psycoy

I need to have the option to evaluate the Benchmark with an Open Source Model as LLM-Judge. ~~How Can I do that, if this is note possible shall we work on a PR?~~ I have started a PR: https://github.com/Psycoy/MixEval/pull/46 Reason for this issue...We might face:

sensible own benchmark data
do not want to pay for OpenAI
face to many rate limit errors

regards Carsten

Psycoy commented 1 week ago

Hi, I have merged the PR and now it should be able to use. Thanks for your great efforts!