EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.41k stars 1.69k forks source link

Why no results for closed-sourced models? #2225

Closed mrconter1 closed 2 weeks ago

mrconter1 commented 3 weeks ago

Hi!

Why don't you benchmark closed models? Is there any project that does that?

Best regards

LSinev commented 2 weeks ago

No results? This framework is not providing any result by itself, AFAIK. Though there are multiple projects that use it as a workhorse to benchmark different models.

For now, there is 2 API provided for closed models in https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/models and there is a template to add more. There are following PRs waiting to be added (or reviewed or some other decision): https://github.com/EleutherAI/lm-evaluation-harness/pull/395 https://github.com/EleutherAI/lm-evaluation-harness/pull/834 https://github.com/EleutherAI/lm-evaluation-harness/pull/936 https://github.com/EleutherAI/lm-evaluation-harness/pull/1996