How would one go about running embedding as a service using something like vLLM?

ContextualAI / gritlm

Generative Representational Instruction Tuning

https://arxiv.org/abs/2402.09906

MIT License

479 stars 33 forks source link

How would one go about running embedding as a service using something like vLLM? #2

Open sungkim11 opened 4 months ago

sungkim11 commented 4 months ago

I would like to run embedding as a service using something like vLLM on a Docker container on different host. How would one go about doing this?

Muennighoff commented 4 months ago

I think it should be easy to serve GritLM using vLLM or similar and providing access to its embedding capability / its language modeling capability or both in one single model / endpoint. But I'm not sure about the details of vllm etc.