BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
14.07k stars 1.66k forks source link

[Bug]: Calling /v1/model/info does not provide correct price info #6850

Closed Jacobh2 closed 4 hours ago

Jacobh2 commented 17 hours ago

Description

Calling the /v1/model/info endpoint does not include prices for models, even though the price is correctly recorded in the database.

I have the following config:

general_settings:
  disable_master_key_return: true
  store_model_in_db: false
litellm_settings:
  allowed_fails: 3
  callbacks: custom_callbacks.proxy_handler_instance
  json_logs: false
  num_retries: 3
  redact_messages_in_exceptions: true
  request_timeout: 600
model_list:
- litellm_params:
    api_base: https://<myurl>.openai.azure.com/
    api_key: os.environ/OPENAI_KEY
    api_version: 2024-09-01-preview
    base_model: o1-mini
    model: azure/o1-mini-no-filters
    rpm: 5000
    tpm: 50000000
  model_info:
    id: openai_o1_mini_no_filters-eastus2
    mode: chat
    region: eastus2
  model_name: openai_o1_mini_no_filters

and make a call

curl --location 'http://localhost:4000/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer sk-1234' \
    --data '{"model": "openai_o1_mini_no_filters","messages": [{"role": "user","content": "Give me a joke"}], "stream": false}' | jq .

I can correctly see that the call is made and the spend is recorded in the database:

image

I can also see in the logs the following:

api         | 15:05:16 - LiteLLM Proxy:DEBUG: proxy_server.py:1063 - Runs spend update on all tables
api         | 15:05:16 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
api         | 15:05:16 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - updated_value in success call: {'current_requests': 0, 'current_tpm': 238, 'current_rpm': 1}, precise_minute: 2024-11-21-15-05
api         | 15:05:16 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - updated_value in success call: {'current_requests': 0, 'current_tpm': 238, 'current_rpm': 1}, precise_minute: 2024-11-21-15-05
api         | 15:05:16 - LiteLLM Proxy:DEBUG: proxy_server.py:950 - adding spend to key db. Response cost: 0.002748. Token: 88dc28d0f030c55ed4ab77ed8faf098196cb1c05df778539800c9f1243fe6b4b.
api         | 15:05:16 - LiteLLM Proxy:DEBUG: proxy_server.py:995 - adding spend to team db. Response cost: 0.002748. team_id: None.
api         | 15:05:16 - LiteLLM Proxy:DEBUG: proxy_server.py:999 - track_cost_callback: team_id is None. Not tracking spend for team
api         | 15:05:16 - LiteLLM Proxy:DEBUG: proxy_server.py:1030 - adding spend to org db. Response cost: 0.002748. org_id: None.
api         | 15:05:16 - LiteLLM Proxy:DEBUG: proxy_server.py:1036 - track_cost_callback: org_id is None. Not tracking spend for org

But if I make a request to the model info API endpoint, I get the following data:

{
    "data": [
        {
            "model_name": "openai_o1_mini_no_filters",
            "litellm_params": {
                "tpm": 50000000,
                "rpm": 5000,
                "api_base": "https://<myurl>.openai.azure.com/",
                "api_version": "2024-09-01-preview",
                "model": "azure/o1-mini-no-filters",
                "base_model": "o1-mini"
            },
            "model_info": {
                "id": "openai_o1_mini_no_filters-eastus2",
                "db_model": false,
                "mode": "chat",
                "region": "eastus2",
                "key": "azure/o1-mini-no-filters",
                "max_tokens": null,
                "max_input_tokens": null,
                "max_output_tokens": null,
                "input_cost_per_token": 0,
                "cache_creation_input_token_cost": null,
                "cache_read_input_token_cost": null,
                "input_cost_per_character": null,
                "input_cost_per_token_above_128k_tokens": null,
                "input_cost_per_query": null,
                "input_cost_per_second": null,
                "input_cost_per_audio_token": null,
                "output_cost_per_token": 0,
                "output_cost_per_audio_token": null,
                "output_cost_per_character": null,
                "output_cost_per_token_above_128k_tokens": null,
                "output_cost_per_character_above_128k_tokens": null,
                "output_cost_per_second": null,
                "output_cost_per_image": null,
                "output_vector_size": null,
                "litellm_provider": "azure",
                "supported_openai_params": [
                    "logit_bias",
                    "max_tokens",
                    "max_completion_tokens",
                    "modalities",
                    "prediction",
                    "seed",
                    "stream",
                    "temperature",
                    "max_retries",
                    "extra_headers"
                ],
                "supports_system_messages": null,
                "supports_response_schema": null,
                "supports_vision": false,
                "supports_function_calling": false,
                "supports_assistant_prefill": false,
                "supports_prompt_caching": false,
                "supports_audio_input": false,
                "supports_audio_output": false
            }
        }
    ]
}

where the costs are set to null.

Expected

I would expect to get back the correct cost for this model!

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 16 hours ago

api_base: https://.openai.azure.com/

it looks like you're using azure

this is because price is calculated based on the model name returned by azure (since the deployment name can be anything)

you can specify the base_model this maps to, to get the relevant price information - https://docs.litellm.ai/docs/proxy/cost_tracking#spend-tracking-for-azure-openai-models

Jacobh2 commented 13 hours ago

Yes, and as you can see in my config I provided, I do provide the base model. I believe this is why I do get the proper cost in the database, without it, I would assume I would get also the wrong/no cost in the database.

Please stop closing issues until they are properly resolved @krrishdholakia 🙏 😅

krrishdholakia commented 4 hours ago

The base model needs to be set in model_info

https://docs.litellm.ai/docs/proxy/cost_tracking#chat-completions--embeddings