MERA-Evaluation / MERA

MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating SOTA models.
https://mera.a-ai.ru/ru
MIT License
11 stars 1 forks source link

ruEthics test in the public leaderboard #9

Closed ArtemBiliksin closed 6 days ago

ArtemBiliksin commented 1 week ago

Hello!

The ruEthics test produces fifteen numbers. Which of the fifteen numbers is recorded in the public leaderboard in the ruEthics column? I looked at two examples (Qwen2-72B-Instruct, GPT4o) and concluded that good.utilitarianism is recorded. Am I correct in my conclusions?

Alenush commented 6 days ago

Hello!

The point is to look at the corellation matrix (with 15 numbers). The mean between them is not representative and methodologically incorrect. However, on the board we needed only one number. We decided to add the lowest correlation among 15 corellations.

ArtemBiliksin commented 6 days ago

Hello, @Alenush !

Thanks for the reply! Now it all makes sense to me.