What are the specific models used to compute difficulty score for MixEval-Hard?

Psycoy / MixEval

The official evaluation suite and dynamic data release for MixEval.

https://mixeval.github.io/

222 stars 34 forks source link

What are the specific models used to compute difficulty score for MixEval-Hard? #29

Closed calvinh99 closed 3 months ago

calvinh99 commented 3 months ago

Hi @Psycoy ,

I couldn't find which models were used in the difficulty score calculation of MixEval-Hard. Would it be possible to disclose the specific models/model ids?

Thanks, Calvin

Psycoy commented 3 months ago

Hi Calvin,

We used all the models in the paper, except gpt-4o, qwen-max, and mammoth 2 (which were updated later)