BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.96k stars 1.65k forks source link

[Bug]: Race condition: Wrong trace_id sent to Langfuse when Redis caching is enabled #6783

Open yuriykuzin opened 1 day ago

yuriykuzin commented 1 day ago

What happened

When using LiteLLM with Redis caching enabled and making parallel calls, incorrect trace_ids are being sent to Langfuse, despite langfuse_context.get_current_trace_id() returning the correct value. The issue appears to be a race condition that only occurs when Redis caching is enabled - the problem disappears when using in-memory cache only.

LiteLLM version: 1.52.9

Steps to Reproduce

Reproduction Code

import asyncio
from litellm import Router
import litellm
from langfuse.decorators import observe
import os
from langfuse.decorators import langfuse_context

# Configuration
MODEL_NAME = "your-model-name"  # Change to your deployment name
API_BASE = "https://your-endpoint.openai.azure.com"  # Insert your api base
API_VERSION = "2023-12-01-preview"
API_KEY = os.getenv("AZURE_API_KEY")
REDIS_URL = "redis://localhost:6379"

# Langfuse configuration
os.environ["LANGFUSE_HOST"] = "your-langfuse-host"
os.environ["LANGFUSE_PUBLIC_KEY"] = "your-public-key"
os.environ["LANGFUSE_SECRET_KEY"] = "your-secret-key"

# Configure LiteLLM callbacks
litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse"]

# Initialize router
router = Router(
    model_list=[
        {
            "model_name": MODEL_NAME,
            "litellm_params": {
                "model": f"azure/{MODEL_NAME}",
                "api_base": API_BASE,
                "api_key": API_KEY,
                "api_version": API_VERSION,
            },
        }
    ],
    default_litellm_params={"acompletion": True},
    # Once REDIS is enabled here, langfuse integration sends the wrong
    # trace_id in parallel calls:
    redis_url=REDIS_URL,
)

async def call_llm(prompt: str):
    # Correct trace_id is printed here:
    print(
        "get_current_trace_id:",
        langfuse_context.get_current_trace_id(),
    )

    # Surprisingly, acompletion() works good, but we need
    # completions.create() to be fixed, as we need it for integration with
    # Instructor.
    # response = await router.acompletion(

    response = await router.chat.completions.create(
        model=MODEL_NAME,
        messages=[{"role": "user", "content": prompt}],
        metadata={
            "trace_id": langfuse_context.get_current_trace_id(),
            "generation_name": prompt,
            "debug_langfuse": True,
        },
    )
    return response

@observe()
async def process():
    # First call with Request1
    await call_llm("Tell me the result of 2+2")

    # Second call with Request2
    await call_llm("Do you like Math, yes or no?")

async def main():
    # Run two process functions in parallel
    await asyncio.gather(process(), process())

if __name__ == "__main__":
    asyncio.run(main())

Current Behavior

When Redis caching is enabled and parallel calls are made:

langfuse_context.get_current_trace_id() returns the correct trace_id However, the wrong trace_id is being sent to Langfuse This can be verified by adding a print statement before line 296 in litellm/integrations/langfuse/langfuse.py

get_current_trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
get_current_trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
get_current_trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
get_current_trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
Real sent trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193

Here c45394a2-4fa0-4599-aa3c-88a101b35868 should be sent twice, but in fact it has been sent only once. And fcb74aee-2de0-465e-b1f3-afd4730fe193 should be sent only twice, but it has been sent 3 times.

Expected Behavior

The correct trace_id should be sent to Langfuse, matching the one returned by langfuse_context.get_current_trace_id() Trace IDs should remain consistent regardless of whether Redis caching is enabled or not.

In this example here's what is being sent when Redis is disabled:

get_current_trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
get_current_trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
get_current_trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
get_current_trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
Real sent trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3

Each trace_id has been sent 2 times.

Additional Notes

Possible Investigation Points

Race condition in how trace IDs are handled when Redis caching is enabled. Difference in trace ID handling between acompletion() and completions.create().

Files to Look At

Let me know if you need any additional information or clarification.

Relevant log output

No response

Twitter / LinkedIn details

No response

yuriykuzin commented 17 hours ago

Actually, even more, the whole langfuse report during parallel calls sometimes is wrong when Redis caching is enabled.