default judge model setting for the leaderboard

EQ-bench / EQ-Bench

A benchmark for emotional intelligence in large language models

MIT License

180 stars 13 forks source link

Closed gyin94 closed 5 months ago

gyin94 commented 5 months ago

may I ask what the default judge model is?

sam-paech commented 5 months ago

For the creative writing leaderboard, it's claude-3-opus.

I will probably at some point make it an aggregate of multiple judges, since they all have a small amount of self-bias.