bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
782 stars 208 forks source link

Is it possible to run the harness against API hosted models? #148

Open pnewhook opened 11 months ago

pnewhook commented 11 months ago

I have a model that's only available through a RESTful API, and need to get some benchmarks. I'd like to run MultiPL-E benchmarks with a few languages. Has any work gone into using bigcode-evaluation-harness to perform generation with an API instead of on the local machine?

lm-evaluation-harness has the ability to run against commercial APIs, especially OpenAI.

loubnabnl commented 11 months ago

Hello, we currently don't support external APIs, only generations with transformers. Feel free to open a PR if you have something in mind.

If you're interested in HumanEvalPack benchmarks and OpenAI models there's a task that supports it(docs here)https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/lm_eval/tasks/humanevalpack_openai.py

krrishdholakia commented 11 months ago

is there a way i could 'fake' a local model and have it call a hosted API endpoint? @pnewhook @loubnabnl

loubnabnl commented 10 months ago

I don't think that's possible with current setup which uses transformers loading that assumes you have the model checkpoint