Closed PrathamSoni closed 11 months ago
Hey @PrathamSoni thanks for the issue - do you have a way for me to repro this bug?
I'll push a fix for it.
import psutil
%load_ext dotenv
%dotenv
def print_open_files():
pid = os.getpid()
proc = psutil.Process(pid)
print(f"Number of file descriptors for process {pid}: {proc.num_fds()}")
from litellm import completion
print_open_files()
for i in range(10):
response = completion(
model="anyscale/mistralai/Mistral-7B-Instruct-v0.1",
messages=[
{"role": "system", "content": "You're a chat bot"},
{"role": "user", "content": "Whats 1+1"},
]
)
# print(response)
print("Iteration", i)
print_open_files()
Gives something like
Iteration 0
Number of file descriptors for process 67985: 82
Iteration 1
Number of file descriptors for process 67985: 82
Iteration 2
Number of file descriptors for process 67985: 85
Iteration 3
Number of file descriptors for process 67985: 87
Iteration 4
Number of file descriptors for process 67985: 91
Iteration 5
Number of file descriptors for process 67985: 93
Iteration 6
Number of file descriptors for process 67985: 96
Iteration 7
Number of file descriptors for process 67985: 99
Iteration 8
Number of file descriptors for process 67985: 103
Iteration 9
Number of file descriptors for process 67985: 84
With many async concurrent requests these fds pile up even worse.
Thanks for this @PrathamSoni.
Just updated the async requests to aiohttp for openai and azure (includes openai-compatible api's like anyscale). Which might help (will update ticket once it's done with ci/cd).
I'll have a fix out for you by tomorrow. Also reached out via linkedin if that works as a support channel.
Curious - why're you using litellm here?
Thinking on how to tackle this, it would be insufficient for the BaseLLM to have this function as streaming would require the client session to still be maintained and that's handled by CustomStreamWrapper in utils.py.
Since we have the Logging class instrumented across our code-base, we're able to tell when requests are completed. So we could instead have client sessions be closed on_success / on_failure.
Thanks for this @PrathamSoni.
Just updated the async requests to aiohttp for openai and azure (includes openai-compatible api's like anyscale). Which might help (will update ticket once it's done with ci/cd).
I'll have a fix out for you by tomorrow. Also reached out via linkedin if that works as a support channel.
Curious - why're you using litellm here?
We're broadly interested in leveraging router for load balancing + caching so want to improve the scaling properties here as well.
Re: implementation --> exploring using a manual session from the other thread to manually use session information to see if that fixes this issue.
@PrathamSoni if you're looking for azure only - have you explored Azure Front Door + APIM?
https://shiroyama.medium.com/load-balancing-aoai-with-azure-front-door-d3efec8d92b1
Thanks for the resource. Unfortunately, we're pretty much off azure currently.
Ok update! Don't think it's actually the session. Could it be leaking elsewhere?
import psutil
%load_ext dotenv
%dotenv
def print_open_files():
pid = os.getpid()
proc = psutil.Process(pid)
print(f"Number of file descriptors for process {pid}: {proc.num_fds()}")
import requests
import litellm
import asyncio
# litellm.set_verbose=True
async def make_request(i):
response = await litellm.acompletion(
model="anyscale/mistralai/Mistral-7B-Instruct-v0.1",
messages=[
{"role": "system", "content": "You're a chat bot"},
{"role": "user", "content": "Whats 1+1"},
]
)
return response
print("init")
print_open_files()
for i in range(10):
with requests.Session() as custom_session:
adapter = requests.adapters.HTTPAdapter(pool_connections=100)
custom_session.mount('http://', adapter)
custom_session.mount('https://', adapter)
litellm.client_session = custom_session
tasks = [make_request(j) for j in range(10)]
res = await asyncio.gather(*tasks)
print_open_files()
print_open_files()
Ok going to close this asw. Closing session manually seems to be fine. FD problem is broadly well documented: https://webcitation.org/6ICibHuyd. Going to go rewrite some client code on my end. Appreciate the time and the aiohttp optimizations.
@PrathamSoni set up a dedicated support channel on linkedin for quicker reverts. Do you have any suggestions for how we can improve this on our end?
Will sync with you over linkedin. Let's loop back here after some ideation.
The Feature
Allow BaseLLM/complete calls to close the underlying client session.
Motivation, pitch
Client session created here is not closed leading to a huge number of leaked file descriptors.
Twitter / LinkedIn details
No response