EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.84k stars 1.82k forks source link

TODOs for Implementing LLM-as-a-Judge in Eval-Harness (Work in Progress) #2233

Open SeungoneKim opened 2 months ago

SeungoneKim commented 2 months ago

@haileyschoelkopf @lintangsutawika @baberabb

The following is a list of TODOs to implement LLM-as-a-Judge in Eval-Harness:

TLDR

abzb1 commented 1 day ago

Is there any progress?? Thank you.