Psycoy / MixEval

The official evaluation suite and dynamic data release for MixEval.
https://mixeval.github.io/
196 stars 28 forks source link

Examples for open-source model judges & parsers #15

Closed IdoGalilDeci closed 2 months ago

IdoGalilDeci commented 2 months ago

Hey, thanks for your great work!

Could you please provide examples for how to run the benchmarks with an open-source alternative parser\judge to GPT 3.5? The readme mentions that "Open-source model parsers are also supported." but I couldn't figure how exactly to set them with mix_eval.evaluate and if there are any specific settings required for running an open-source model. Lastly, the paper mentions that "We will also provide an open-source model parser with its stability test to ensure long-term reproducibility". If you could provide such an open-source model that is tested it will be amazing.

Thanks!

Psycoy commented 2 months ago

Hey,

Thanks for using MixEval. We are currently working on the open-source model parser. Stay tuned!

IdoGalilDeci commented 2 months ago

Thanks! Sorry for asking, but so I can plan better my working schedule, is it due soon? Or should I consider alternatives to an open-source parser?

Psycoy commented 2 months ago

Oh sorry it's not due soon. I suggest you to use the official parser for now, which is has been tested stable.

Psycoy commented 2 months ago

For the official parser, I mean the GPT-35-Turbo-0125, as specified in the repo.

IdoGalilDeci commented 2 months ago

Thanks. I think open-source \ custom judges and parsers will be very beneficial for practitioners and help with benchmark adoption by reducing the cost and feasibility of scaled uses of it.

Psycoy commented 2 months ago

Indeed, we will try to finish the test for open-source parser ASAP!