bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
702 stars 180 forks source link

Make `main.py` compatible with OpenAI compatible APIs #189

Open hmellor opened 5 months ago

hmellor commented 5 months ago

Solves #161 and #148 and is an alternative to #179.

Employs the DRY principle by only changing the creation of the Evaluator class in main.py and generation.parallel_generations function. Therefore, won't need to maintain multiple Evaluator classes in parallel.

Using the completions instead of chat.completions was a design choice because it eliminates errors/confusion from additional chat templating taking place behind the API.

If you want to evaluate a model running behind an OpenAI compatible API, then you can use base_url to send any generation requests to that URL.

hmellor commented 5 months ago

@loubnabnl, if you have time I'd appreciate a review, thanks!

tshrjn commented 5 months ago

Seems like there is an issue with chat format:

    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "'messages' is a required property", 'type': 'invalid_request_error', 'param': None, 'code': None}}
  0%|                                                                                                                                                      | 0/164 [00:02<?, ?it/s]
Task exception was never retrieved
future: <Task finished name='Task-4' coro=<tqdm_asyncio.gather.<locals>.wrap_awaitable() done, defined at /opt/homebrew/anaconda3/envs/7diamond/lib/python3.10/site-packages/tqdm/asyncio.py:75> exception=BadRequestError('Error code: 400 - {\'error\': {\'message\': "\'messages\' is a required property", \'type\': \'invalid_request_error\', \'param\': None, \'code\': None}}')>
Traceback (most recent call last):
  File "/opt/homebrew/anaconda3/envs/env_name/lib/python3.10/site-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
    return i, await f
  File "/opt/homebrew/anaconda3/envs/env_name/lib/python3.10/site-packages/openai/resources/completions.py", line 1020, in create
    return await self._post(
  File "/opt/homebrew/anaconda3/envs/env_name/lib/python3.10/site-packages/openai/_base_client.py", line 1705, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
  File "/opt/homebrew/anaconda3/envs/env_name/lib/python3.10/site-packages/openai/_base_client.py", line 1408, in request
    return await self._request(
  File "/opt/homebrew/anaconda3/envs/7diamond/lib/python3.10/site-packages/openai/_base_client.py", line 1499, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "'messages' is a required property", 'type': 'invalid_request_error', 'param': None, 'code': None}}
hmellor commented 5 months ago

@tshrjn you're going to need to provide more context, the word chat doesn't feature in my PR at all.

In the PR description I explicitly state that I am not using the chat endpoint, so I don't know what you did to get a chat error.

nielstron commented 1 week ago

I tested this branch and it worked perfectly fine. Only caveat, it really only works with completion models (i.e. babbage, davinci at OpenAI) and not with chat models! But this is expected due to the format of the benchmark.