TGI support - API evaluation of HF models

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

https://www.eleuther.ai

MIT License

5.98k stars 1.59k forks source link

TGI support - API evaluation of HF models #869

Open ManuelFay opened 10 months ago

ManuelFay commented 10 months ago

Since HF TGI's PR was merged, it should possible to integrate TGI endpoints to the lm-evaluation-harness supported APIs.

Any plans to do so ? This would enable decorrelating the evaluation machine from the served model and largely help facilitate evaluation and hosting !

Thanks a lot for the great work !

haileyschoelkopf commented 10 months ago

Hi! We'd love to move toward hosting models as endpoints to make evaluation faster and more lightweight than using HF models locally.

Adding vLLM, TGI, and support for inference on a separate machine / in a subprocess is on the roadmap long-term, but we don't have an ETA on it--if you are interested in helping contribute such a feature, let us know!

ManuelFay commented 10 months ago

I am, but won't have time over the next couple of weeks and will probably resort to using the lm-eval-harness as is (or add a few tasks) ! Thanks again for the great work !

sfriedowitz commented 9 months ago

Adding vLLM, TGI, and support for inference on a separate machine / in a subprocess is on the roadmap long-term

@haileyschoelkopf I've been looking into this idea a bit, as it's something that would be incredibly useful for my organization. One thing I'm curious about is if it is clear what API protocol an external model would need to satisfy to be compatible with lm-eval-harness. For instance, HELM recently introduced support for externally hosted models for the Neurips challenge, where encoding/decoding of tokens is handled externally by the service. That protocol involves three POST endpoints, /encode, /decode, and /process.

Is there a single protocol that a vLLM or TGI powered service would have to satisfy to be queryable by lm-eval-harness?

Cheers, Sean

ishaan-jaff commented 9 months ago

I believe LiteLLM can help with this - we allow you to call TGI LLMs in the Completion Input/Output format Thanks @Vinno97 cc @ManuelFay @haileyschoelkopf

ishaan-jaff commented 8 months ago

here's a tutorial on using our openai proxy server to call HF TGI models with lm-evaluation harness docs: https://docs.litellm.ai/docs/tutorials/lm_evaluation_harness

Usage

Step 1: Start the local proxy

litellm --model huggingface/bigcode/starcoder

OpenAI Compatible Endpoint at http://0.0.0.0:8000/

Step 2: Set OpenAI API Base

$ export OPENAI_API_BASE="http://0.0.0.0:8000"

Step 3: Run LM-Eval-Harness

$ python3 main.py \
  --model gpt3 \
  --model_args engine=huggingface/bigcode/starcoder \
  --tasks hellaswag

ManuelFay commented 8 months ago

That's very cool, thanks !

ManuelFay commented 8 months ago

I have a problem with your code snippet @ishaan-jaff:

KeyError: 'Could not automatically map huggingface/my_model to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.'

ishaan-jaff commented 8 months ago

@ManuelFay are you on the big refactor branch ?

can i see your code

how you start the litellm proxy
the command you're using to call lm harness

ManuelFay commented 8 months ago

Yup big-refactor branch:

Start proxy: litellm --model "huggingface/manu/llama-oscar-fr"
Command to start: python main.py --model openai-completions --model_args engine=huggingface/manu/llama-oscar-fr --tasks hellaswag (not sure we should continue this discussion here though, does not relate to the issue)

ishaan-jaff commented 8 months ago

Agreed - I send you a linkedin request @ManuelFay - you can also DM on discord about this: https://discord.com/invite/wuPM9dRgDw