EQ-bench / EQ-Bench

A benchmark for emotional intelligence in large language models
MIT License
180 stars 13 forks source link

Add some of the new 100B+ models to the leaderboard #5

Closed cosmojg closed 7 months ago

cosmojg commented 7 months ago

It would be really cool to see how some of the more recent megamerges stack up against the rest of the competition, especially those which build on top of miqu-1-70b. Specifically, I think it would be interesting to benchmark the EQ of the following models:

And if it's not way too big to benchmark...

sam-paech commented 7 months ago

Thx for the suggestions. I actually did some of these already, just added the latest to the leaderboard.

wolfram/miquliz-120b-v2.0   82.21
alpindale/miquella-120b 82.15
migtissera/Tess-72B-v1.5b   81.78

I'll see if I can get TheProfessor to run, that one looks like a beast. Not sure if I'll get to lzlv & miquliz soon but they're on my list.

cosmojg commented 7 months ago

@sqrkl Oh sweet, that's awesome! Thank you for all of your hard work! Out of curiosity, what kind of hardware are you using to run these benchmarks?

sam-paech commented 7 months ago

I spin up instances on runpod or vast.ai. My local hardware is not really capable of most of these models.

sam-paech commented 7 months ago

Just added TheProfessor-155b.