Closed julienOlivier3 closed 3 days ago
Hi, we use vllm, which support inference with open-source LLMs from Huggingface. Simply specify api="vllm"
and model=[huggingface repo id]
to use this feature.
Does your vllm integration use offline or online inference or is both possible?
The integration only uses offline inference for now. Do you need online inference?
Great, offline inference is what I am looking for. Thanks for your swift reply!
Do you (plan) to support open source LLMs from Huggingface?