bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
825 stars 219 forks source link

Is it possible to run the harness against API hosted models? #148

Open pnewhook opened 1 year ago

pnewhook commented 1 year ago

I have a model that's only available through a RESTful API, and need to get some benchmarks. I'd like to run MultiPL-E benchmarks with a few languages. Has any work gone into using bigcode-evaluation-harness to perform generation with an API instead of on the local machine?

lm-evaluation-harness has the ability to run against commercial APIs, especially OpenAI.

loubnabnl commented 1 year ago

Hello, we currently don't support external APIs, only generations with transformers. Feel free to open a PR if you have something in mind.

If you're interested in HumanEvalPack benchmarks and OpenAI models there's a task that supports it(docs here)https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/lm_eval/tasks/humanevalpack_openai.py

krrishdholakia commented 1 year ago

is there a way i could 'fake' a local model and have it call a hosted API endpoint? @pnewhook @loubnabnl

loubnabnl commented 1 year ago

I don't think that's possible with current setup which uses transformers loading that assumes you have the model checkpoint