Lightning-AI / LitServe

Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.
https://lightning.ai/docs/litserve
Apache License 2.0
2.21k stars 134 forks source link

Embedding model support with openai spec #305

Open riyajatar37003 opened 4 days ago

riyajatar37003 commented 4 days ago

Hi, can i do the own custom embedding model deployment with litserve.? any document on this

grumpyp commented 4 days ago

There's a great tutorial on how to do this from one of the maintainers @aniketmaurya

https://lightning.ai/lightning-ai/studios/deploy-text-embedding-api-with-litserve

You'd basically just wrap the model in the setup method and could use it in the predict method,..

riyajatar37003 commented 4 days ago

its not opena ai compatible thats what i am looking

riyajatar37003 commented 4 days ago

`from sentence_transformers import SentenceTransformer import litserve as ls class EmbeddingAPI(ls.LitAPI): def setup(self, device): self.instruction = "Represent this sentence for searching relevant passages: " self.model = SentenceTransformer('BAAI/bge-large-en-v1.5', device=device)

def decode_request(self, request):
    return request["input"]

def predict(self, query):
    return self.model.encode([self.instruction + query], normalize_embeddings=True)

def encode_response(self, output):
    return {"embedding": output[0].tolist()}

if name == "main": api = EmbeddingAPI() server = ls.LitServer(api,api_path='/embeddings') server.run(port=8000) `

import litellm import os

response = litellm.embedding( model="openai/mymodel", # add openai/ prefix to model so litellm knows to route to OpenAI api_key="No-key", # api key to your openai compatible endpoint api_base="http://127.0.0.1:8000", # set API Base of your Custom OpenAI Endpoint input="good morning from litellm" )

print(response)

bhimrazy commented 4 days ago

Hi @riyajatar37003, this studio might be helpful for your use case: https://lightning.ai/bhimrajyadav/studios/deploy-openai-like-embedding-api-with-litserve-on-studios.

It also includes additional features like support for different models. Feel free to modify it to suit your needs.

riyajatar37003 commented 4 days ago

thanks that part is clear , but i am tryng to use it with litellm proxy server in floowing way

import litellm import os

response = litellm.embedding( model="openai/mymodel", # add openai/ prefix to model so litellm knows to route to OpenAI api_key="No-key", # api key to your openai compatible endpoint api_base="http://127.0.0.1:8000/", # set API Base of your Custom OpenAI Endpoint

but getting error NotFoundError: litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}

aniketmaurya commented 4 days ago

@riyajatar37003 currently LitServe doesn't have inbuilt way to serve OpenAI compatible Embedding model. It can be implemented using the OpenAISpec class.

Would love to see a contribution if you are interested 😄

Demirrr commented 3 days ago

Stating that Litserve has the Open AI compatibility is an overstatement isn't it ? @aniketmaurya

https://github.com/Lightning-AI/LitServe?tab=readme-ov-file#features https://lightning.ai/docs/litserve/features/open-ai-spec#openai-api

vllm is a perfect example for the Open AI compatibility https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html

riyajatar37003 commented 3 days ago

Ya but does it support embedding

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Caglar Demir @.> Sent: Tuesday, October 1, 2024 1:26:11 PM To: Lightning-AI/LitServe @.> Cc: Riyaj Atar @.>; Mention @.> Subject: Re: [Lightning-AI/LitServe] Embedding model support with openai spec (Issue #305)

[External Email]


Stating that Litserve has the Open AI compatibility is an overstatement isn't it ? @aniketmauryahttps://github.com/aniketmaurya

https://github.com/Lightning-AI/LitServe?tab=readme-ov-file#features https://lightning.ai/docs/litserve/features/open-ai-spec#openai-apihttps://lightning.ai/docs/litserve/features/open-ai-spec#openai-api

vllm is a perfect example for the Open AI compatibility https://docs.vllm.ai/en/latest/serving/openai_compatible_server.htmlhttps://docs.vllm.ai/en/latest/serving/openai_compatible_server.html

— Reply to this email directly, view it on GitHubhttps://github.com/Lightning-AI/LitServe/issues/305#issuecomment-2385052394, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BIJR5XVWSSEHBRMBHCLPEILZZJIRXAVCNFSM6AAAAABPCUNUB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBVGA2TEMZZGQ. You are receiving this because you were mentioned.Message ID: @.***>

aniketmaurya commented 3 days ago

@riyajatar37003 no, it doesn't - https://github.com/Lightning-AI/LitServe/issues/305#issuecomment-2382891574

aniketmaurya commented 3 days ago

Stating that Litserve has the Open AI compatibility is an overstatement isn't it ? @aniketmaurya

Lightning-AI/LitServe#features lightning.ai/docs/litserve/features/open-ai-spec#openai-api

vllm is a perfect example for the Open AI compatibility docs.vllm.ai/en/latest/serving/openai_compatible_server.html

@Demirrr don't see anything that can be done with vLLM and not with LitServe OpenAISpec. Maybe I might be missing something. What are you trying to do with LitServe and unable to do?

Demirrr commented 3 days ago
  1. vLLM supports embedding models and LitServe currently does not support as you have also pointed out it.
  2. vLLM supports guided_choice option and few more usefull options (https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-completions-api), e.g., the following computation cannot be carried out in LitServe.

completion = client.chat.completions.create(
  model="NousResearch/Meta-Llama-3-8B-Instruct",
  messages=[
    {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
  ],
  extra_body={
    "guided_choice": ["positive", "negative"]
  }
)
  1. Autoscalling, Multi-machine inference, and Authentication features are free in vllm

Currently, I am unable to see the advantages of using LitServe over vllm. Yet, please correct me if if any of the above written points are wrong.

riyajatar37003 commented 3 days ago

Where do you see that vllm supports embedding models?

From: Caglar Demir @.> Date: Tuesday, 1 October 2024 at 2:23 PM To: Lightning-AI/LitServe @.> Cc: Riyaj Atar @.>, Mention @.> Subject: Re: [Lightning-AI/LitServe] Embedding model support with openai spec (Issue #305)

  1. [External Email]

vLLM supports embedding models and LitServe currently does not support as you have also pointed out it.

  1. vLLM supports guided_choice option and few more usefull options (https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-completions-apihttps://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-completions-api), e.g., the following computation cannot be carried out in LitServe.

completion = client.chat.completions.create(

model="NousResearch/Meta-Llama-3-8B-Instruct",

messages=[

{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}

],

extra_body={

"guided_choice": ["positive", "negative"]

}

)

  1. Autoscalling, Multi-machine inference, and Authentication features are free in vllm

Currently, I am unable to see the advantages of using LitServe over vllm. Yet, please correct me if if any of the above written points are wrong.

— Reply to this email directly, view it on GitHubhttps://github.com/Lightning-AI/LitServe/issues/305#issuecomment-2385195613, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BIJR5XXLWUDVK5M2EZWYKJLZZJPGTAVCNFSM6AAAAABPCUNUB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBVGE4TKNRRGM. You are receiving this because you were mentioned.Message ID: @.***>

Demirrr commented 3 days ago

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_embedding.py @riyajatar37003

riyajatar37003 commented 3 days ago

Its offline , not similar to decoder model. Is my understanding correct?

From: Caglar Demir @.> Date: Tuesday, 1 October 2024 at 2:28 PM To: Lightning-AI/LitServe @.> Cc: Riyaj Atar @.>, Mention @.> Subject: Re: [Lightning-AI/LitServe] Embedding model support with openai spec (Issue #305) [External Email]


https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_embedding.py @riyajatar37003https://github.com/riyajatar37003

— Reply to this email directly, view it on GitHubhttps://github.com/Lightning-AI/LitServe/issues/305#issuecomment-2385209626, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BIJR5XSS7UIGCBGBG6IINTTZZJP3NAVCNFSM6AAAAABPCUNUB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBVGIYDSNRSGY. You are receiving this because you were mentioned.Message ID: @.***>

aniketmaurya commented 1 day ago

@Demirrr LitServe is generic model serving library not only for LLMs.

At the moment, it provides OpenAISpec support for chat completion API only. To serve an embedding model with OpenAI API, you can use the Pydantic input request to build a OpenAI compatible format.

Other features like guided choice can also be implemented by customizing the decode_method.

Autoscaling and authentication is free in LitServe too. Please feel free to refer to the docs (lightning.ai/litserve) and let us know if you have any feedback!