bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
698 stars 180 forks source link

API-based evaluation support (humanevalpack_openai.py is too old) #234

Open s-natsubori opened 1 month ago

s-natsubori commented 1 month ago

I am trying evaluate API base model . (Local model working on vLLM Engine, and vLLM provide OpenAI Compatible API interface)

but tasks/humanevalpack_openai.py is not updated. (postprocess_generation is not applied, save format is incompatible etc) So I can't pass generation results to evaluate.

Do you have any plans to update it in the future?

If API-based evaluation is applied to all benchmarks, more models can be evaluated easily.