API-based evaluation support (humanevalpack_openai.py is too old)

I am trying evaluate API base model . （Local model working on vLLM Engine, and vLLM provide OpenAI Compatible API interface）

but tasks/humanevalpack_openai.py is not updated. (postprocess_generation is not applied, save format is incompatible etc) So I can't pass generation results to evaluate.

Do you have any plans to update it in the future?

If API-based evaluation is applied to all benchmarks, more models can be evaluated easily.

bigcode-project / bigcode-evaluation-harness

API-based evaluation support (humanevalpack_openai.py is too old) #234