Open ManuelFay opened 10 months ago
Hi! We'd love to move toward hosting models as endpoints to make evaluation faster and more lightweight than using HF models locally.
Adding vLLM, TGI, and support for inference on a separate machine / in a subprocess is on the roadmap long-term, but we don't have an ETA on it--if you are interested in helping contribute such a feature, let us know!
I am, but won't have time over the next couple of weeks and will probably resort to using the lm-eval-harness as is (or add a few tasks) ! Thanks again for the great work !
Adding vLLM, TGI, and support for inference on a separate machine / in a subprocess is on the roadmap long-term
@haileyschoelkopf I've been looking into this idea a bit, as it's something that would be incredibly useful for my organization. One thing I'm curious about is if it is clear what API protocol an external model would need to satisfy to be compatible with lm-eval-harness. For instance, HELM recently introduced support for externally hosted models for the Neurips challenge, where encoding/decoding of tokens is handled externally by the service. That protocol involves three POST endpoints, /encode
, /decode
, and /process
.
Is there a single protocol that a vLLM or TGI powered service would have to satisfy to be queryable by lm-eval-harness?
Cheers, Sean
I believe LiteLLM can help with this - we allow you to call TGI LLMs in the Completion Input/Output format Thanks @Vinno97 cc @ManuelFay @haileyschoelkopf
here's a tutorial on using our openai proxy server to call HF TGI models with lm-evaluation harness docs: https://docs.litellm.ai/docs/tutorials/lm_evaluation_harness
Step 1: Start the local proxy
litellm --model huggingface/bigcode/starcoder
OpenAI Compatible Endpoint at http://0.0.0.0:8000/
Step 2: Set OpenAI API Base
$ export OPENAI_API_BASE="http://0.0.0.0:8000"
Step 3: Run LM-Eval-Harness
$ python3 main.py \
--model gpt3 \
--model_args engine=huggingface/bigcode/starcoder \
--tasks hellaswag
That's very cool, thanks !
I have a problem with your code snippet @ishaan-jaff:
KeyError: 'Could not automatically map huggingface/my_model to a tokeniser. Please use tiktoken.get_encoding
to explicitly get the tokeniser you expect.'
@ManuelFay are you on the big refactor branch ?
can i see your code
Yup big-refactor branch:
litellm --model "huggingface/manu/llama-oscar-fr"
python main.py --model openai-completions --model_args engine=huggingface/manu/llama-oscar-fr --tasks hellaswag
(not sure we should continue this discussion here though, does not relate to the issue)Agreed - I send you a linkedin request @ManuelFay - you can also DM on discord about this: https://discord.com/invite/wuPM9dRgDw
Since HF TGI's PR was merged, it should possible to integrate TGI endpoints to the lm-evaluation-harness supported APIs.
Any plans to do so ? This would enable decorrelating the evaluation machine from the served model and largely help facilitate evaluation and hosting !
Thanks a lot for the great work !