[Bug]: Race condition: Wrong trace_id sent to Langfuse when Redis caching is enabled

What happened

When using LiteLLM with Redis caching enabled and making parallel calls, incorrect trace_ids are being sent to Langfuse, despite langfuse_context.get_current_trace_id() returning the correct value. The issue appears to be a race condition that only occurs when Redis caching is enabled - the problem disappears when using in-memory cache only.

LiteLLM version: 1.52.9

Steps to Reproduce

Set up LiteLLM with Redis caching and Langfuse integration
Run multiple parallel calls using the code below
Observe the trace IDs in Langfuse dashboard or add printing trace_id here: https://github.com/BerriAI/litellm/blob/5f298cb9de48481fc99fe37963d1cd24c24dc697/litellm/integrations/langfuse/langfuse.py#L296

Reproduction Code

import asyncio
from litellm import Router
import litellm
from langfuse.decorators import observe
import os
from langfuse.decorators import langfuse_context

# Configuration
MODEL_NAME = "your-model-name"  # Change to your deployment name
API_BASE = "https://your-endpoint.openai.azure.com"  # Insert your api base
API_VERSION = "2023-12-01-preview"
API_KEY = os.getenv("AZURE_API_KEY")
REDIS_URL = "redis://localhost:6379"

# Langfuse configuration
os.environ["LANGFUSE_HOST"] = "your-langfuse-host"
os.environ["LANGFUSE_PUBLIC_KEY"] = "your-public-key"
os.environ["LANGFUSE_SECRET_KEY"] = "your-secret-key"

# Configure LiteLLM callbacks
litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse"]

# Initialize router
router = Router(
    model_list=[
        {
            "model_name": MODEL_NAME,
            "litellm_params": {
                "model": f"azure/{MODEL_NAME}",
                "api_base": API_BASE,
                "api_key": API_KEY,
                "api_version": API_VERSION,
            },
        }
    ],
    default_litellm_params={"acompletion": True},
    # Once REDIS is enabled here, langfuse integration sends the wrong
    # trace_id in parallel calls:
    redis_url=REDIS_URL,
)

async def call_llm(prompt: str):
    # Correct trace_id is printed here:
    print(
        "get_current_trace_id:",
        langfuse_context.get_current_trace_id(),
    )

    # Surprisingly, acompletion() works good, but we need
    # completions.create() to be fixed, as we need it for integration with
    # Instructor.
    # response = await router.acompletion(

    response = await router.chat.completions.create(
        model=MODEL_NAME,
        messages=[{"role": "user", "content": prompt}],
        metadata={
            "trace_id": langfuse_context.get_current_trace_id(),
            "generation_name": prompt,
            "debug_langfuse": True,
        },
    )
    return response

@observe()
async def process():
    # First call with Request1
    await call_llm("Tell me the result of 2+2")

    # Second call with Request2
    await call_llm("Do you like Math, yes or no?")

async def main():
    # Run two process functions in parallel
    await asyncio.gather(process(), process())

if __name__ == "__main__":
    asyncio.run(main())

Current Behavior

When Redis caching is enabled and parallel calls are made:

langfuse_context.get_current_trace_id() returns the correct trace_id However, the wrong trace_id is being sent to Langfuse This can be verified by adding a print statement before line 296 in litellm/integrations/langfuse/langfuse.py

get_current_trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
get_current_trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
get_current_trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
get_current_trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
Real sent trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193

Here c45394a2-4fa0-4599-aa3c-88a101b35868 should be sent twice, but in fact it has been sent only once. And fcb74aee-2de0-465e-b1f3-afd4730fe193 should be sent only twice, but it has been sent 3 times.

Expected Behavior

The correct trace_id should be sent to Langfuse, matching the one returned by langfuse_context.get_current_trace_id() Trace IDs should remain consistent regardless of whether Redis caching is enabled or not.

In this example here's what is being sent when Redis is disabled:

get_current_trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
get_current_trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
get_current_trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
get_current_trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
Real sent trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3

Each trace_id has been sent 2 times.

Additional Notes

The issue only occurs when Redis caching is enabled.
The problem disappears when using in-memory cache only.
Interestingly, router.acompletion() works correctly, but router.chat.completions.create() exhibits the issue. This affects integrations that specifically need to use completions.create(), such as Instructor

Possible Investigation Points

Race condition in how trace IDs are handled when Redis caching is enabled. Difference in trace ID handling between acompletion() and completions.create().

Files to Look At

litellm/integrations/langfuse/langfuse.py (specifically around line 296)

Let me know if you need any additional information or clarification.

Relevant log output

No response

Twitter / LinkedIn details

No response

BerriAI / litellm