[Bug]: Sometimes tasks created with asyncio.create_task get detroyed while they are pending

Peter4daggai commented 2 months ago

What happened?

I have a CustomLogger and noticed that sometimes I got "Task was destroyed but it is pending!" The task was async_success_handler in litellm_logging.py. After some investigation I found the reason in utils.py: asyncio.create_task is called but the task is not assigned. The task returned from asyncio.create_task is a weak reference. After changing the code in utils.py the problem disappeared.

I added the following to utils.py:

futures_set = set()  # set to create strong refs for tasks
# Wraps asyncio.create_task to create a strong ref
def create_task(coro, *, done_callback=None, name=None, context=None):
    task = asyncio.create_task(coro, name=name, context=context)
    futures_set.add(task)

    def dcb(t):
        # NOTE: if the result is not assigned, sometimes you'll get:
        # RuntimeWarning: coroutine '<some_name>' was never awaited
        # I do not know why but this removes that problem
        try:
            r = t.result() 
        except Exception:
            pass
        finally:
            futures_set.discard(t)

    if done_callback is not None:
        task.add_done_callback(done_callback)
    task.add_done_callback(dcb)

    return task

I then replaced all calls to asyncio.create_task to create_task

Peter Friebel Dagg AI

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 2 months ago

@Peter4daggai this has the potential to introduce memory leaks, as the garbage collection would no longer be automatically handled

When are you seeing these errors? I don't see it when running litellm on a server

Peter4daggai commented 2 months ago

Hello Krish, I have a CustomLogger in a FastAPI deployment. There are tons of ather async Tasks going on. Sometimes the Tasks created by the code in utils.py (the async logging calls) just do not make it before they get collected. Your statement that this would cause memory leaks is not true as I created a wrapper that always adds a done_callback removing the ref from the set. This is in fact the way it is described in the python doc here: https://docs.python.org/3/library/asyncio-task.html#creating-tasks

It doesn't happen often but I can not miss a single log call because my priority queue makes decisions based on the request count and token throughput.

Best regard, Peter Friebel Dagg AI

On Sat, Sep 7, 2024 at 5:50 PM Krish Dholakia @.***> wrote:

@Peter4daggai https://github.com/Peter4daggai this has the potential to introduce memory leaks, as the garbage collection would no longer be automatically handled

When are you seeing these errors? I don't see it when running litellm on a server

— Reply to this email directly, view it on GitHub https://github.com/BerriAI/litellm/issues/5559#issuecomment-2335635792, or unsubscribe https://github.com/notifications/unsubscribe-auth/BKH6LN7UV656X5MT33243DDZVMOEJAVCNFSM6AAAAABNYEY5QCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZVGYZTKNZZGI . You are receiving this because you were mentioned.Message ID: @.***>

BerriAI / litellm