0.4.0 lm-evaluation-harness

ai-forever / MERA

MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundamental models.

MIT License

49 stars 8 forks source link

0.4.0 lm-evaluation-harness #15

Open germanjke opened 4 months ago

germanjke commented 4 months ago

Hi!

Your benchmarks are functioning well with version 0.3.0 of lm-evaluation-harness. Are there any plans to update and support version 0.4.0?

LSinev commented 4 months ago

Yes, there are! :) stay tuned!

LSinev commented 4 months ago

Do you have any particular expectations for improvements with the upgrade to the 0.4.0+ backend?

germanjke commented 4 months ago

@LSinev Hi, Looks like vllm engine which supported in 0.4.0 works faster than hf engine

germanjke commented 2 months ago

hello guys! can I ask you, do you work on this topic, maybe you have some estimated dates? @LSinev

LSinev commented 2 months ago

will give more information next week, or may be even branch for playing/testing work in progress

LSinev commented 2 months ago

new_harness_codebase — "work in progress" branch with submoduled patched (waiting for PR to be merged) lm-evaluation-harness. All scores will change. Leaderboard will not publish these yet, but you can use for private scoring. Baseline models scoring should be done by you. Changes to model running code (lm-evaluation-harness side) should be done at their repository to be supported here.

germanjke commented 2 months ago

great, thank you!

germanjke commented 1 month ago

Hi @LSinev,

I noticed that the tasks from the branch do not include the MERA tasks in 0.4.x format. I checked the link you provided here, and it seems they are indeed missing.

Could you please confirm if the MERA tasks will be added to this branch, or if there is another location where they might be available?

Thanks!

LSinev commented 1 month ago

I checked the link you provided here, a

This link goes to fork of lm-evaluation-harness. In this fork there is a code needed for RuTiE task, which is PRed in lm-evaluation-harness, but not yet approved and merged. There is no plans yet to submit MERA tasks directly into lm-evaluation-harness.

new_harness_codebase is using 0.4.x code, but tasks are not in fully yaml format yet (will be, but not yet, just like, for example, SQUADv2 task in lm-evaluation-harness). MERA tasks are stored in https://github.com/ai-forever/MERA/tree/update/new_harness_codebase/benchmark_tasks as new code allows to use tasks from external directory.