BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.07k stars 1.39k forks source link

[Feature]: Support cost mapping for OpenAI-compatible API #5008

Open xingyaoww opened 1 month ago

xingyaoww commented 1 month ago

The Feature

Support a custom field like actual_model that allows an OpenAI-compatible API to point to an existing model in https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json, so that the user can get price and context info about the model.

Motivation, pitch

When using LiteLLM Proxy Server, we create OpenAI-compatible APIs. However, when users directly call it using litellm via OpenAI-compatible API, they might not fully benefit from LiteLLM, including cost tracking and model info.

Twitter / LinkedIn details

No response

krrishdholakia commented 1 month ago

hi @xingyaoww you can already set custom pricing - https://docs.litellm.ai/docs/proxy/custom_pricing

What am i missing here?

xingyaoww commented 1 month ago

Yeah i think that's one way to go, but it requires the user to manually look up the pricing for different models. I guess is it possible for user to say this "openai/claude-3-5" corresponds to "claude-3-5-sonnet" in the existing file (https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json) that LiteLLM maintained, and will inherit all the settings from there?

krrishdholakia commented 1 month ago

hey @xingyaoww i haven't seen this before - why is your claude an openai-compatible endpoint but then being mapped to another model on the model cost map?

Can you help me understand what you're doing?

xingyaoww commented 1 month ago

Yep - so I'm setting a LiteLLM LLM proxy which hosts Claude (and other models like Gemini) I got from a different LLM provider (e.g., vertex AI, bedrock) into an OpenAI compatible API, so the end user can just use these models from different provider through a single LLM-proxy OpenAI-compatible base_url.

But if the end-user needs to track costs, they need to manually set things like per token prices for different models, which can be somewhat complicated. Is there a way from either the llm-proxy side, or the client side to map the model to a cost&context info of a known model (e.g., LiteLLM already have cost and context info for claude and gemini)?

krrishdholakia commented 1 month ago

hey @xingyaoww the cost for the call is already returned to the end-user in the response headers. they can also calculate the cost ahead of time via /spend/calculate https://litellm-api.up.railway.app/#/Budget%20%26%20Spend%20Tracking/calculate_spend_spend_calculate_post

What am i missing?

krrishdholakia commented 1 month ago

How are you getting successful claude responses, if you're passing model as openai/claude.. for vertex/bedrock?

Those are called differently - https://docs.litellm.ai/docs/providers/vertex, https://docs.litellm.ai/docs/providers/bedrock

xingyaoww commented 1 month ago

How are you getting successful claude responses, if you're passing model as openai/claude.. for vertex/bedrock?

Because i deployed the LiteLLM Proxy (https://docs.litellm.ai/docs/simple_proxy), and access the proxied model through LiteLLM again following this: https://docs.litellm.ai/docs/proxy/quick_start#using-litellm-proxy---curl-request-openai-package-langchain

Is there a better way to do this?

Basically

[VertexAI / BedRock] <----> LiteLLM Proxy <---OpenAI compatible API ---> Local LiteLLM completion('openai/XXX')