BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.24k stars 1.42k forks source link

[Bug]: 'CompletionUsage' object has no attribute 'get' #5165

Closed lazyhope closed 1 month ago

lazyhope commented 1 month ago

What happened?

Commit 1553f7fa4844ea4d4117c7a75d165ca2e747b81a introduced some incompatibilities and causes AttributeError: 'CompletionUsage' object has no attribute 'get' thrown by https://github.com/BerriAI/litellm/blob/dc8f9e72414ed54f34197fe379810cef71e0847a/litellm/cost_calculator.py#L494

Relevant log output

File "/Users/xxx/miniconda3/envs/langchain/lib/python3.11/site-packages/litellm/cost_calculator.py", line 725, in response_cost_calculator
    response_cost = completion_cost(
                    ^^^^^^^^^^^^^^^^
  File "/Users/xxx/miniconda3/envs/langchain/lib/python3.11/site-packages/litellm/cost_calculator.py", line 665, in completion_cost
    raise e
  File "/Users/xxx/miniconda3/envs/langchain/lib/python3.11/site-packages/litellm/cost_calculator.py", line 494, in completion_cost
    prompt_tokens = completion_response.get("usage", {}).get("prompt_tokens", 0)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/miniconda3/envs/langchain/lib/python3.11/site-packages/pydantic/main.py", line 828, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'CompletionUsage' object has no attribute 'get'

Twitter / LinkedIn details

No response

krrishdholakia commented 1 month ago

@lazyhope unable to repro this - the Usage function has the functionality to handle the .get. Can you give me a script for repro?

Here's how i tested it -

from litellm.types.utils import Choices, Message, ModelResponse, Usage

    response_object = ModelResponse(
        id="26c0ef045020429d9c5c9b078c01e564",
        choices=[
            Choices(
                finish_reason="stop",
                index=0,
                message=Message(
                    content="Hello! I'm Litellm Bot, your helpful assistant. While I can't provide real-time weather updates, I can help you find a reliable weather service or guide you on how to check the weather on your device. Would you like assistance with that?",
                    role="assistant",
                    tool_calls=None,
                    function_call=None,
                ),
            )
        ],
        created=1722124652,
        model="vertex_ai/mistral-large",
        object="chat.completion",
        system_fingerprint=None,
        usage=Usage(prompt_tokens=32, completion_tokens=55, total_tokens=87),
    )
    model = "mistral-large@2407"
    messages = [{"role": "user", "content": "Hey, hows it going???"}]
    custom_llm_provider = "vertex_ai"
    predictive_cost = completion_cost(
        completion_response=response_object,
        model=model,
        messages=messages,
        custom_llm_provider=custom_llm_provider,
    )

    assert predictive_cost > 0
lazyhope commented 1 month ago

@lazyhope unable to repro this - the Usage function has the functionality to handle the .get. Can you give me a script for repro?

Here's how i tested it -

from litellm.types.utils import Choices, Message, ModelResponse, Usage

    response_object = ModelResponse(
        id="26c0ef045020429d9c5c9b078c01e564",
        choices=[
            Choices(
                finish_reason="stop",
                index=0,
                message=Message(
                    content="Hello! I'm Litellm Bot, your helpful assistant. While I can't provide real-time weather updates, I can help you find a reliable weather service or guide you on how to check the weather on your device. Would you like assistance with that?",
                    role="assistant",
                    tool_calls=None,
                    function_call=None,
                ),
            )
        ],
        created=1722124652,
        model="vertex_ai/mistral-large",
        object="chat.completion",
        system_fingerprint=None,
        usage=Usage(prompt_tokens=32, completion_tokens=55, total_tokens=87),
    )
    model = "mistral-large@2407"
    messages = [{"role": "user", "content": "Hey, hows it going???"}]
    custom_llm_provider = "vertex_ai"
    predictive_cost = completion_cost(
        completion_response=response_object,
        model=model,
        messages=messages,
        custom_llm_provider=custom_llm_provider,
    )

    assert predictive_cost > 0

It seems I was using a older version of the package, sorry for the false alarm!

lazyhope commented 1 month ago

@krrishdholakia I am now actually able to reproduce the error using instructor==1.3.5 and the latest litellm with the following code:

import os, asyncio
from instructor import from_litellm, Mode
from litellm import acompletion
from pydantic import BaseModel

class User(BaseModel):
    name: str

client = from_litellm(acompletion, mode=Mode.MD_JSON)
asyncio.run(
    client.chat.completions.create(
        messages=[{"role": "user", "content": "Joe"}],
        response_model=User,
        api_key=os.getenv("GOOGLE_API_KEY"),
        model="gemini/gemini-1.5-pro-exp-0801",
    )
)

It seems like it happens during the second retry attempt, I will try to see if it's an issue with litellm or instructor

lazyhope commented 1 month ago

After some debugging it seems that once threads defined here starts: https://github.com/BerriAI/litellm/blob/d0a68ab123a8fb5b3cc1c137f41ee1ae408571cb/litellm/utils.py#L1494-L1499 at some point result.get("usage") inside https://github.com/BerriAI/litellm/blob/d0a68ab123a8fb5b3cc1c137f41ee1ae408571cb/litellm/litellm_core_utils/litellm_logging.py#L624-L628 changed from Usage to CompletionUsage (potentially by one of the running thread), which causes the error.

@krrishdholakia I think the rest may be beyond my knowledge so could you please take a look at it?

krrishdholakia commented 1 month ago

Thanks for the great work @lazyhope i'll take a look at this now

krrishdholakia commented 1 month ago

able to repro

krrishdholakia commented 1 month ago

this seems to only happen when using instructor - i wonder if it's modifying some param

lazyhope commented 1 month ago

@krrishdholakia Sorry for bothering again, but I found that in the latest version, when serving with FastAPI:

import os
from instructor import from_litellm, Mode
from litellm import acompletion, Usage
from pydantic import BaseModel
from fastapi import FastAPI

class User(BaseModel):
    name: str

app = FastAPI()

aclient = from_litellm(acompletion, mode=Mode.MD_JSON)
@app.get("/")
async def get_res():
    user = await aclient.chat.completions.create(
        messages=[{"role": "user", "content": "Joe"}],
        response_model=User,
        api_key=os.getenv("GOOGLE_API_KEY"),
        model="gemini/gemini-1.5-flash",
    )
    print(user._raw_response.usage)

The printed usage becomes CompletionUsage(completion_tokens=11, prompt_tokens=137, total_tokens=148) again, but if I run the inner function code directly in ipython, the usage type is Usage.

This indeterministic behaviour seems quite confusing to me and I suspect it has something to do with the thread management.

johnreyev commented 3 weeks ago

I'm getting the same error when using instructor and trying to get the completion_cost

litellm==1.43.18 instructor==1.3.7

import instructor
from litellm import Router, completion_cost
from pydantic import BaseModel, Field

router =  <your_router_here>
instructions = <instruction_content_here>

class RefinedTopics(BaseModel):
    topics: List[str] = Field(description="a list of refined topics")

llm = instructor.from_litellm(router.completion)

response = llm.chat.completions.create(
    model=MODEL_NAME,
    response_model=RefinedTopics,
    max_retries=5,
    messages=[
        {
            "role": "user",
            "content": instructions,
        }
    ],
)

cost = completion_cost(completion_response=response)

ERROR - Something went wrong 'RefinedTopics' object has no attribute 'get

Edit: (This works)

    response, completion = llm.chat.completions.create_with_completion(
        model=MODEL_NAME,
        response_model=RefinedTopics,
        max_retries=5,
        messages=[
            {
                "role": "user",
                "content": instructions,
            }
        ],
    )

    cost = completion_cost(completion_response=completion)