BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
https://docs.litellm.ai/docs/
Other
10.22k stars 1.14k forks source link

[Bug]: Broken s3 cache creation with streaming? #3268

Open Manouchehri opened 2 months ago

Manouchehri commented 2 months ago

What happened?

Caching does not seem to working with this PoC:

#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri

import os
import asyncio
import openai
import logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

c_handler = logging.StreamHandler()
logger.addHandler(c_handler)

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_API_BASE = os.getenv("OPENAI_API_BASE") or "https://api.openai.com/v1"

client = openai.AsyncOpenAI(
    api_key=OPENAI_API_KEY,
    base_url=OPENAI_API_BASE,
)

async def main():
    response = await client.chat.completions.create(
        model="gemini-1.5-pro-preview-0409",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What’s in this image?"
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                        }
                    }
                ]
            }
        ],
        stream=True,
        temperature=0.0,
    )

    logger.debug("Failed to print non-stream")

    current_str = ""
    async for chunk in response:
        logger.debug(chunk)
        if chunk.choices[0].delta.content:
            current_str += chunk.choices[0].delta.content

        logger.debug(current_str)
        logger.debug("---")

if __name__ == "__main__":
    asyncio.run(main())

Caching is working with this:

#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri

import os
import asyncio
import openai
import logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

c_handler = logging.StreamHandler()
logger.addHandler(c_handler)

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_API_BASE = os.getenv("OPENAI_API_BASE") or "https://api.openai.com/v1"

client = openai.AsyncOpenAI(
    api_key=OPENAI_API_KEY,
    base_url=OPENAI_API_BASE,
)

async def main():
    response = await client.chat.completions.create(
        model="gemini-1.5-pro-preview-0409",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What’s in this image?"
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                        }
                    }
                ]
            }
        ],
        stream=False,
        temperature=0.0,
    )

    logger.debug(response.model_dump_json(indent=2))

if __name__ == "__main__":
    asyncio.run(main())

Note: if you run the non-streaming script, then the streaming script will successfully use the cache.

Relevant log output

No response

Twitter / LinkedIn details

https://www.linkedin.com/in/davidmanouchehri/

krrishdholakia commented 2 months ago

i don't see how you've setup caching. can you share that too?

Manouchehri commented 2 months ago
litellm_settings:
  drop_params: True
  cache: True
  cache_params:
    type: s3
    s3_bucket_name: os.environ/CACHING_S3_BUCKET_NAME
    s3_region_name: os.environ/CACHING_AWS_DEFAULT_REGION
    s3_aws_access_key_id: os.environ/CACHING_AWS_ACCESS_KEY_ID
    s3_aws_secret_access_key: os.environ/CACHING_AWS_SECRET_ACCESS_KEY
    s3_endpoint_url: os.environ/CACHING_AWS_ENDPOINT_URL_S3
  failure_callback: ["sentry", "langfuse"]
  num_retries_per_request: 3
  success_callback: ["langfuse", "s3"]
  s3_callback_params:
    s3_bucket_name: os.environ/LOGGING_S3_BUCKET_NAME
    s3_region_name: os.environ/LOGGING_AWS_DEFAULT_REGION
    s3_aws_access_key_id: os.environ/LOGGING_AWS_ACCESS_KEY_ID
    s3_aws_secret_access_key: os.environ/LOGGING_AWS_SECRET_ACCESS_KEY
    s3_endpoint_url: os.environ/LOGGING_AWS_ENDPOINT_URL_S3
  default_team_settings:
    - team_id: david_dev
      success_callback: ["langfuse", "s3"]
      langfuse_secret: os.environ/LANGFUSE_PRIVATE_KEY_DAVID
      langfuse_public_key: os.environ/LANGFUSE_PUBLIC_KEY_DAVID

general_settings: 
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL
  database_connection_pool_limit: 1
  disable_spend_logs: True

router_settings:
  routing_strategy: simple-shuffle

environment_variables:

model_list:
  - model_name: gemini-1.5-pro-preview-0409
    litellm_params:
      model: vertex_ai/gemini-1.5-pro-preview-0409
      vertex_project: litellm-epic
      vertex_location: northamerica-northeast1
      max_tokens: 8192

  - model_name: gemini-1.5-pro-preview-0409
    litellm_params:
      model: vertex_ai/gemini-1.5-pro-preview-0409
      vertex_project: litellm-epic
      vertex_location: southamerica-east1
      max_tokens: 8192

I am using a key that belongs to david_dev.

krrishdholakia commented 2 months ago

i believe we have some testing on this. will look into this more

krrishdholakia commented 2 months ago

@Manouchehri would help if you could add any bugs you believe we should prioritize to this week's bug bash - https://github.com/BerriAI/litellm/issues/3045

Manouchehri commented 2 months ago

Heading to bed atm, will do tomorrow! Thank you! This one and the s3 team logging are the two highest priorities for me for sure.

Do you want me to maybe create github issue labels for low, medium, high, and critical priorities? That's what my team does for our internal projects. 😀

Manouchehri commented 2 months ago

This is still a bug btw, checked today.