Closed calvinh99 closed 3 months ago
Hi @Psycoy ,
I couldn't find which models were used in the difficulty score calculation of MixEval-Hard. Would it be possible to disclose the specific models/model ids?
Thanks, Calvin
Hi Calvin,
We used all the models in the paper, except gpt-4o, qwen-max, and mammoth 2 (which were updated later)
Hi @Psycoy ,
I couldn't find which models were used in the difficulty score calculation of MixEval-Hard. Would it be possible to disclose the specific models/model ids?
Thanks, Calvin