[Feature]: Specify ability to close client_session

PrathamSoni commented 11 months ago

The Feature

Allow BaseLLM/complete calls to close the underlying client session.

Motivation, pitch

    def create_client_session(self):
        if litellm.client_session: 
            session = litellm.client_session
        else: 
            session = requests.Session()

        return session

Client session created here is not closed leading to a huge number of leaked file descriptors.

Twitter / LinkedIn details

No response

krrishdholakia commented 11 months ago

Hey @PrathamSoni thanks for the issue - do you have a way for me to repro this bug?

I'll push a fix for it.

PrathamSoni commented 11 months ago

import psutil
%load_ext dotenv
%dotenv

def print_open_files():
    pid = os.getpid()
    proc = psutil.Process(pid)

    print(f"Number of file descriptors for process {pid}: {proc.num_fds()}")

from litellm import completion

print_open_files()
for i in range(10):
    response = completion(
        model="anyscale/mistralai/Mistral-7B-Instruct-v0.1",
        messages=[
                {"role": "system", "content": "You're a chat bot"},
                {"role": "user", "content": "Whats 1+1"},
            ]
    )
    # print(response)
    print("Iteration", i)
    print_open_files()

Gives something like

Iteration 0
Number of file descriptors for process 67985: 82
Iteration 1
Number of file descriptors for process 67985: 82
Iteration 2
Number of file descriptors for process 67985: 85
Iteration 3
Number of file descriptors for process 67985: 87
Iteration 4
Number of file descriptors for process 67985: 91
Iteration 5
Number of file descriptors for process 67985: 93
Iteration 6
Number of file descriptors for process 67985: 96
Iteration 7
Number of file descriptors for process 67985: 99
Iteration 8
Number of file descriptors for process 67985: 103
Iteration 9
Number of file descriptors for process 67985: 84

With many async concurrent requests these fds pile up even worse.

krrishdholakia commented 11 months ago

Thanks for this @PrathamSoni.

Just updated the async requests to aiohttp for openai and azure (includes openai-compatible api's like anyscale). Which might help (will update ticket once it's done with ci/cd).

I'll have a fix out for you by tomorrow. Also reached out via linkedin if that works as a support channel.

Curious - why're you using litellm here?

krrishdholakia commented 11 months ago

Thinking on how to tackle this, it would be insufficient for the BaseLLM to have this function as streaming would require the client session to still be maintained and that's handled by CustomStreamWrapper in utils.py.

Since we have the Logging class instrumented across our code-base, we're able to tell when requests are completed. So we could instead have client sessions be closed on_success / on_failure.

PrathamSoni commented 11 months ago

Thanks for this @PrathamSoni.

Just updated the async requests to aiohttp for openai and azure (includes openai-compatible api's like anyscale). Which might help (will update ticket once it's done with ci/cd).

I'll have a fix out for you by tomorrow. Also reached out via linkedin if that works as a support channel.

Curious - why're you using litellm here?

We're broadly interested in leveraging router for load balancing + caching so want to improve the scaling properties here as well.

Re: implementation --> exploring using a manual session from the other thread to manually use session information to see if that fixes this issue.

krrishdholakia commented 11 months ago

@PrathamSoni if you're looking for azure only - have you explored Azure Front Door + APIM?

https://shiroyama.medium.com/load-balancing-aoai-with-azure-front-door-d3efec8d92b1

PrathamSoni commented 11 months ago

Thanks for the resource. Unfortunately, we're pretty much off azure currently.

PrathamSoni commented 11 months ago

Ok update! Don't think it's actually the session. Could it be leaking elsewhere?

import psutil
%load_ext dotenv
%dotenv

def print_open_files():
    pid = os.getpid()
    proc = psutil.Process(pid)

    print(f"Number of file descriptors for process {pid}: {proc.num_fds()}")

import requests
import litellm
import asyncio
# litellm.set_verbose=True

async def make_request(i):
    response = await litellm.acompletion(
        model="anyscale/mistralai/Mistral-7B-Instruct-v0.1",
        messages=[
                {"role": "system", "content": "You're a chat bot"},
                {"role": "user", "content": "Whats 1+1"},
            ]
    )

    return response

print("init")
print_open_files()

for i in range(10):
    with requests.Session() as custom_session:
        adapter = requests.adapters.HTTPAdapter(pool_connections=100)
        custom_session.mount('http://', adapter)
        custom_session.mount('https://', adapter)
        litellm.client_session = custom_session
        tasks = [make_request(j) for j in range(10)]
        res = await asyncio.gather(*tasks)
    print_open_files()

print_open_files()

PrathamSoni commented 11 months ago

Ok going to close this asw. Closing session manually seems to be fine. FD problem is broadly well documented: https://webcitation.org/6ICibHuyd. Going to go rewrite some client code on my end. Appreciate the time and the aiohttp optimizations.

krrishdholakia commented 11 months ago

@PrathamSoni set up a dedicated support channel on linkedin for quicker reverts. Do you have any suggestions for how we can improve this on our end?

PrathamSoni commented 11 months ago

Will sync with you over linkedin. Let's loop back here after some ideation.

BerriAI / litellm