BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.99k stars 1.66k forks source link

[Feature]: cancel backend llm api calls, when client-side disconnects #3341

Closed krrishdholakia closed 6 months ago

krrishdholakia commented 6 months ago

The Feature

In the case like this, the suggested solution is to enhance the cancellation propagation in your FastAPI application by introducing an operation that can be cancelled once the client disconnects:

from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import StreamingResponse
from starlette.requests import Request
import asyncio
from typing import Generator

app = FastAPI()

def stream_slow_data(cancel_token: asyncio.Event) -> Generator[str, None, None]:
    for i in range(100):
        if cancel_token.is_set():  # if cancel token was set, stop streaming
            break
        yield f"{i}\n"
        time.sleep(0.1)

@app.get('/')
async def root(request: Request, bt: BackgroundTasks):
    cancel_token = asyncio.Event()

    def abort_stream(response_started):
        # the event is set when the call is cancelled
        cancel_token.set()

    request.add_event_handler('disconnect', abort_stream)

    return StreamingResponse(stream_slow_data(cancel_token))

In this example, when a client disconnects from the server, it sets a cancellation token, which the server checks during the data streaming. If the cancellation token is set, it stops streaming the data.

Motivation, pitch

user request

Twitter / LinkedIn details

cc: @NiklasWilson

ishaan-jaff commented 6 months ago

done

mofanke commented 3 months ago

@ishaan-jaff sorry, request.add_event_handler('disconnect', abort_stream) ^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Request' object has no attribute 'add_event_handler', i need this feature , i don't know how to work?

krrishdholakia commented 3 months ago

@mofanke what's the ask? is the backend llm api call not disconnecting? and is your backend api vllm?

mofanke commented 3 months ago

@mofanke what's the ask? is the backend llm api call not disconnecting? and is your backend api vllm?

@app.get('/')
async def root(request: Request, bt: BackgroundTasks):
    cancel_token = asyncio.Event()

    def abort_stream(response_started):
        # the event is set when the call is cancelled
        cancel_token.set()

    request.add_event_handler('disconnect', abort_stream)

    return StreamingResponse(stream_slow_data(cancel_token))

request.add_event_handler('disconnect', abort_stream) can not work, request.add_event_handler('disconnect', abort_stream) ^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Request' object has no attribute 'add_event_handler',

so , the backend llm api call not disconnecting when client-side disconnects

krrishdholakia commented 3 months ago

@mofanke the code i shared was sample code to give a rough idea of how to do something

mofanke commented 3 months ago

@mofanke the code i shared was sample code to give a rough idea of how to do something

get, thx

mofanke commented 3 months ago

@ishaan-jaff I noticed this commit (498bfa9) titled "fix - revert check_request_disconnection". It appears to remove the check_request_disconnection functionality from the chat_completion and completion routes in the proxy_server.py file.

I'm curious about the reasoning behind this change. Was there an issue with the previous implementation of check_request_disconnection? Or is this related to this feature request?