I am trying evaluate API base model .
(Local model working on vLLM Engine, and vLLM provide OpenAI Compatible API interface)
but tasks/humanevalpack_openai.py is not updated.
(postprocess_generation is not applied, save format is incompatible etc)
So I can't pass generation results to evaluate.
Do you have any plans to update it in the future?
If API-based evaluation is applied to all benchmarks, more models can be evaluated easily.
I am trying evaluate API base model . (Local model working on vLLM Engine, and vLLM provide OpenAI Compatible API interface)
but tasks/humanevalpack_openai.py is not updated. (postprocess_generation is not applied, save format is incompatible etc) So I can't pass generation results to evaluate.
Do you have any plans to update it in the future?
If API-based evaluation is applied to all benchmarks, more models can be evaluated easily.