Code evaluation using bigcode-evaluation-harness framework

Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Apache License 2.0

10.8k stars 1.07k forks source link

Code evaluation task/benchmark such as HumanEval and MBPP are missing from lm-evaluation-harness, but are present and maintained in bigcode-evaluation-harness.

https://github.com/bigcode-project/bigcode-evaluation-harness

Since, we would need to parse tasks and check if they are in lm-evaluation-harness or bigcode-evaluation-harness, I propose to keep litgpt evaluate but add argument --framework "lm-evaluation-harness" (default if not specified) or --framework "bigcode-evaluation-harness".

Lightning-AI / litgpt

Code evaluation using bigcode-evaluation-harness framework #1776