When using LiteLLM with Redis caching enabled and making parallel calls, incorrect trace_ids are being sent to Langfuse, despite langfuse_context.get_current_trace_id() returning the correct value. The issue appears to be a race condition that only occurs when Redis caching is enabled - the problem disappears when using in-memory cache only.
LiteLLM version: 1.52.9
Steps to Reproduce
Set up LiteLLM with Redis caching and Langfuse integration
import asyncio
from litellm import Router
import litellm
from langfuse.decorators import observe
import os
from langfuse.decorators import langfuse_context
# Configuration
MODEL_NAME = "your-model-name" # Change to your deployment name
API_BASE = "https://your-endpoint.openai.azure.com" # Insert your api base
API_VERSION = "2023-12-01-preview"
API_KEY = os.getenv("AZURE_API_KEY")
REDIS_URL = "redis://localhost:6379"
# Langfuse configuration
os.environ["LANGFUSE_HOST"] = "your-langfuse-host"
os.environ["LANGFUSE_PUBLIC_KEY"] = "your-public-key"
os.environ["LANGFUSE_SECRET_KEY"] = "your-secret-key"
# Configure LiteLLM callbacks
litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse"]
# Initialize router
router = Router(
model_list=[
{
"model_name": MODEL_NAME,
"litellm_params": {
"model": f"azure/{MODEL_NAME}",
"api_base": API_BASE,
"api_key": API_KEY,
"api_version": API_VERSION,
},
}
],
default_litellm_params={"acompletion": True},
# Once REDIS is enabled here, langfuse integration sends the wrong
# trace_id in parallel calls:
redis_url=REDIS_URL,
)
async def call_llm(prompt: str):
# Correct trace_id is printed here:
print(
"get_current_trace_id:",
langfuse_context.get_current_trace_id(),
)
# Surprisingly, acompletion() works good, but we need
# completions.create() to be fixed, as we need it for integration with
# Instructor.
# response = await router.acompletion(
response = await router.chat.completions.create(
model=MODEL_NAME,
messages=[{"role": "user", "content": prompt}],
metadata={
"trace_id": langfuse_context.get_current_trace_id(),
"generation_name": prompt,
"debug_langfuse": True,
},
)
return response
@observe()
async def process():
# First call with Request1
await call_llm("Tell me the result of 2+2")
# Second call with Request2
await call_llm("Do you like Math, yes or no?")
async def main():
# Run two process functions in parallel
await asyncio.gather(process(), process())
if __name__ == "__main__":
asyncio.run(main())
Current Behavior
When Redis caching is enabled and parallel calls are made:
langfuse_context.get_current_trace_id() returns the correct trace_id
However, the wrong trace_id is being sent to Langfuse
This can be verified by adding a print statement before line 296 in litellm/integrations/langfuse/langfuse.py
get_current_trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
get_current_trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
get_current_trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
get_current_trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
Real sent trace_id: c45394a2-4fa0-4599-aa3c-88a101b35868
Real sent trace_id: fcb74aee-2de0-465e-b1f3-afd4730fe193
Here c45394a2-4fa0-4599-aa3c-88a101b35868 should be sent twice, but in fact it has been sent only once. And fcb74aee-2de0-465e-b1f3-afd4730fe193 should be sent only twice, but it has been sent 3 times.
Expected Behavior
The correct trace_id should be sent to Langfuse, matching the one returned by langfuse_context.get_current_trace_id()
Trace IDs should remain consistent regardless of whether Redis caching is enabled or not.
In this example here's what is being sent when Redis is disabled:
get_current_trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
get_current_trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
get_current_trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
get_current_trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
Real sent trace_id: 94e7c707-0bd7-47a9-8e25-bc8f8eca2b6d
Real sent trace_id: 3a0d9972-9730-465e-9a63-840e9c8f8fd3
Each trace_id has been sent 2 times.
Additional Notes
The issue only occurs when Redis caching is enabled.
The problem disappears when using in-memory cache only.
Interestingly, router.acompletion() works correctly, but router.chat.completions.create() exhibits the issue. This affects integrations that specifically need to use completions.create(), such as Instructor
Possible Investigation Points
Race condition in how trace IDs are handled when Redis caching is enabled.
Difference in trace ID handling between acompletion() and completions.create().
Files to Look At
litellm/integrations/langfuse/langfuse.py (specifically around line 296)
Let me know if you need any additional information or clarification.
What happened
When using LiteLLM with Redis caching enabled and making parallel calls, incorrect trace_ids are being sent to Langfuse, despite langfuse_context.get_current_trace_id() returning the correct value. The issue appears to be a race condition that only occurs when Redis caching is enabled - the problem disappears when using in-memory cache only.
LiteLLM version: 1.52.9
Steps to Reproduce
Reproduction Code
Current Behavior
When Redis caching is enabled and parallel calls are made:
langfuse_context.get_current_trace_id() returns the correct trace_id However, the wrong trace_id is being sent to Langfuse This can be verified by adding a print statement before line 296 in litellm/integrations/langfuse/langfuse.py
Here
c45394a2-4fa0-4599-aa3c-88a101b35868
should be sent twice, but in fact it has been sent only once. Andfcb74aee-2de0-465e-b1f3-afd4730fe193
should be sent only twice, but it has been sent 3 times.Expected Behavior
The correct trace_id should be sent to Langfuse, matching the one returned by langfuse_context.get_current_trace_id() Trace IDs should remain consistent regardless of whether Redis caching is enabled or not.
In this example here's what is being sent when Redis is disabled:
Each trace_id has been sent 2 times.
Additional Notes
Possible Investigation Points
Race condition in how trace IDs are handled when Redis caching is enabled. Difference in trace ID handling between acompletion() and completions.create().
Files to Look At
litellm/integrations/langfuse/langfuse.py
(specifically around line 296)Let me know if you need any additional information or clarification.
Relevant log output
No response
Twitter / LinkedIn details
No response