Open tyler-liner opened 22 hours ago
hi @tyler-liner
hi @ishaan-jaff
- can you try using litellm router - we pre create OpenAI clients on the router to address this problem https://docs.litellm.ai/docs/routing
In the actual program built using LiteLLM, I am using a router, and the same issue occurs. The code attached in the issue description is a minimal reproducible example where the memory leak occurs.
- what does your memory profiler show as increasing your memory usage ? typically memory profilers show the block of code allocating memory
When observing with memory_profiler, it is noticeable that the memory usage increases by 0-0.1MiB at thestream = await litellm.acompletion
line and by 0.4-0.5MiB at the async for chunk in stream
line with every request.
Below is the output from the profiler. Even though the requests were sent with intervals, the memory usage continues to accumulate.
INFO: 127.0.0.1:54455 - "POST /debug HTTP/1.1" 200 OK
Line # Mem usage Increment Occurrences Line Contents
=============================================================
45 414.2 MiB 414.2 MiB 1 @profile
46 async def main_logic(query) -> str:
47 414.3 MiB 0.1 MiB 15 stream = await router.acompletion(
48 414.2 MiB 0.0 MiB 1 model="gpt-4o-mini",
49 414.2 MiB 0.0 MiB 1 api_key=config.openai_api_keys[-1],
50 414.2 MiB 0.0 MiB 1 messages=[{"role": "user", "content": query}],
51 414.2 MiB 0.0 MiB 1 stream=True,
52 )
53 414.3 MiB 0.0 MiB 1 result = ""
54 415.3 MiB 1.0 MiB 491 async for chunk in stream:
55 415.1 MiB 0.0 MiB 428 result += chunk.choices[0].delta.content or ""
56
57 415.3 MiB 0.0 MiB 1 return result
INFO: 127.0.0.1:54455 - "POST /debug HTTP/1.1" 200 OK
Line # Mem usage Increment Occurrences Line Contents
=============================================================
45 415.3 MiB 415.3 MiB 1 @profile
46 async def main_logic(query) -> str:
47 415.3 MiB 0.0 MiB 8 stream = await router.acompletion(
48 415.3 MiB 0.0 MiB 1 model="gpt-4o-mini",
49 415.3 MiB 0.0 MiB 1 api_key=config.openai_api_keys[-1],
50 415.3 MiB 0.0 MiB 1 messages=[{"role": "user", "content": query}],
51 415.3 MiB 0.0 MiB 1 stream=True,
52 )
53 415.3 MiB 0.0 MiB 1 result = ""
54 415.6 MiB 0.4 MiB 476 async for chunk in stream:
55 415.5 MiB 0.0 MiB 411 result += chunk.choices[0].delta.content or ""
56
57 415.6 MiB 0.0 MiB 1 return result
does the memory leak occur without langfuse on ?
Yes, the memory leak occurs even when using only the code above, without Langfuse.
fyi. The graph below shows the memory usage graph when one user repeatedly sends the same request to the server running the code above. (In the test, requests were sent only after receiving a response, so there were no concurrent requests being processed.)
What happened?
fastapi==0.115.2 langfuse==2.45.0 litellm==1.50.1
Hello,
I am in the process of developing an LLM-related application using FastAPI + LiteLLM + Langfuse. However, I have noticed a continuous increase in memory usage as more requests are processed. Upon further investigation, I observed that the memory consumption increases with each call to LiteLLM’s Acompletion.
Since the application code involves multiple complex packages, I am attaching a minimal reproducible example that demonstrates the increase in memory usage.
By running the server as shown above and continuously sending requests, the memory usage increases linearly.
When observing with memory_profiler, it is noticeable that the memory usage increases by 0-0.1MiB at the stream = await litellm.acompletion line and by 0.4-0.5MiB at the async for chunk in stream line with every request. This can be reproduced by continuously sending requests containing the same query at 1-second intervals.
Is there an appropriate solution for this? About 0.5MB of memory keeps accumulating per request.
Thank you.
Relevant log output
No response
Twitter / LinkedIn details
No response