BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.32k stars 1.43k forks source link

[Feature]: Adding support for Volcano Engine #3342

Closed Jeffwan closed 2 months ago

Jeffwan commented 4 months ago

The Feature

I am using LiteLLM to proxy request for different providers.

Motivation, pitch

I am using Volcano Engine internally https://www.volcengine.com/docs/82379/1133189#python and also OpenAI compatible services, I do want to use liteLLM as the proxy layer to aggregate different providers. Now, one blocker is it miss the support for volcano engine.

Twitter / LinkedIn details

https://www.linkedin.com/in/jiaxin-shan/

Jeffwan commented 4 months ago

I check the code and seems the prompt part needs to construct the prompt using chat template and special tokens. Is there a way to hornor the original chat template? If so, that would be easier and clean.

krrishdholakia commented 4 months ago

Hey @Jeffwan this is the curl i see from docs

it looks like volcano should already respect the messages format right (no prompt formatting needed)?

// Request
curl --request POST \
  --url http://maas-api.ml-platform-cn-beijing.volces.com/api/v2/endpoint/${YOUR_ENDPOINT_ID}/chat \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --data '{
        "stream": false,
        "parameters": {
                "max_new_tokens": 1024,
                "temperature": 0.9
        },
        "messages": [
                {
                        "role": "user",
                        "content": "你好"
                },
                {
                        "role": "assistant",
                        "content": "你好,有什么可以帮助您?"
                },
                {
                        "role": "user",
                        "content": "你可以做些什么?"
                }
        ]
}'

// Response
{
        "choices": [
            {
                    "message": {
                            "role": "assistant",
                            "content": "我可以回答各种问题,例如历史、科学、技术、文化、娱乐等方面的问题。我还可以生成文本,例如摘要、文章、故事等。您需要我做什么呢?"
                    },
                    "finish_reason": "stop"
            }
        ],
        "usage": {
                "prompt_tokens": 20,
                "completion_tokens": 43,
                "total_tokens": 63
        }
}
krrishdholakia commented 4 months ago

the only blocker to using it today w/ litellm is the fact that their api endpoint is not /chat/completions but instead just /chat?

krrishdholakia commented 4 months ago

bump on this? @Jeffwan

krrishdholakia commented 4 months ago

Hey @Jeffwan is this still an ongoing blocker?

Happy to do a quick call on this to speed up the PR

https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

Jeffwan commented 4 months ago

@krrishdholakia sorry for late. It is, it uses different SDKs, role definitions, I kind of think we need to add a new model wrapper. I scheduled a sync up with you.

ishaan-jaff commented 2 months ago

done here @Jeffwan https://github.com/BerriAI/litellm/pull/4433