LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
37.09k stars 3.24k forks source link

Add backend API rate limiting #175

Closed andreaskoepf closed 1 year ago

andreaskoepf commented 1 year ago

Add a throttling system to the backend REST endpoints that tracks and limits the interaction frequency of users/api-clients. This will help us to prevent or slow down users/automated systems form flooding our database. Preferably we would like to limit on multiple keys, e.g. user and api_client (each with different thresholds).

  1. Review existing libraries such as laurentS/slowapi & fastapi-limiter and select one (please justify your choice).
  2. Make necessary changes in to existing backend endpoints
GraemeHarris commented 1 year ago

@andreaskoepf Happy to pick this up if free - will add some local load testing and report in the PR

kiritowu commented 1 year ago

The following are my two-cents after reading through the source code and documentations for slowapi and fastapi-limiter:

Storage Options

Since In-built Memory and MongoDB is eliminated due to scalability issue and relative latency (as compared to memory-based alternatives), the two storage options left are Redis vs Memcached in which Redis seems to be a better alternative due to its Replication capabilities and Lua scripting.

Code Usage

One limitation of slowapi is that the request argument must be explicitly stated and hinted as fastapi.Request in the endpoints parameters like this.

@limiter.limit("5/minute")
async def myendpoint(request: Request)
    pass

After reviewing the existing backend code, the request is now type hinted as sub-classes from pydantic.BaseModel to be benefited from fastapi's automated body conversion. However, this will run into problem with slowapi's syntax compatibility since it checks whether is the request argument a subclass for starlette.Request in these few lines.

Solving this issue will require some changes to the naming convention of all our current routes like this:

@router.post("/", response_model=protocol_schema.AnyTask)  # work with Union once more types are added
def request_task(
    *,
    db: Session = Depends(deps.get_db),
    api_key: APIKey = Depends(deps.get_api_key),
    body: protocol_schema.TaskRequest,
    request: Request
) -> Any:
    pass

In contrast, fastapi-limiter's usage is relatively straightforward and more compatible with current code base

import redis.asyncio as redis
import uvicorn
from fastapi import Depends, FastAPI

from fastapi_limiter import FastAPILimiter
from fastapi_limiter.depends import RateLimiter

app = FastAPI()

@app.on_event("startup")
async def startup():
    redis = redis.from_url("redis://localhost", encoding="utf-8", decode_responses=True)
    await FastAPILimiter.init(redis)

@app.get("/", dependencies=[Depends(RateLimiter(times=2, seconds=5))])
async def index():
    return {"msg": "Hello World"}

if __name__ == "__main__":
    uvicorn.run("main:app", debug=True, reload=True)

Shared Limits for Multiple Endpoints

Slowapi provide support of shared rate-limits across multiple endpoints whereas fastapi-limiter's way of redis key definition makes it impossible to have shared limits across multiple endpoints.

However, in our current use-case, it seems like shared limits is not a must-have feature.

Multiple Key Support

Both allow limits on multiple keys with different thresholds for a single endpoint, whereby one need to configure by the key_func method for slowapi or identifier for fastapi-limter parameter

Summary

Both slowapi and fastapi-limiter offers very similar functionalities (and arguably performance since both uses lua script with redis), my personal preference is fastapi-limiter since the syntax is cleaner and easier to implement provided redis is used as the storage backend and shared limit across endpoint is not a required feature. Hope this is helpful 😄

andreaskoepf commented 1 year ago

@GraemeHarris Thanks for offering help with testing, load test code would be appreciated!

@kiritowu Thanks for the detailed comparison. We will follow your recommendation and use fastapi-limiter. If you want to work on this, could you first please submit a separate PR which adds redis to our docker-compose.yaml.

kiritowu commented 1 year ago

Sure, so there will be around ~3 PR for this issue:

  1. redis-in-docker-compose
  2. implement-fastapi-limiter
  3. limiter-load-testing

@GraemeHarris have you started any code on this? If yes, then few free to take this issue as it is around midnight in my timezone, probably only have time to work on this like 10 hours later.

GraemeHarris commented 1 year ago

@kiritowu thanks for the write-up :). I had only done some of the initial research - but I think I can get started on the redis in-docker-compose step if that works for you? I'll update here on if I get to the next steps. I am free for my evening now. (EU-timezone)

kiritowu commented 1 year ago

Sgtm!

andreaskoepf commented 1 year ago

👍 Thanks for peer-coordination!

GraemeHarris commented 1 year ago

@kiritowu https://github.com/LAION-AI/Open-Assistant/pull/187 Basic updates for redis - the containers spin up alright on my local so hopefully helps set you to run when you're back in

kiritowu commented 1 year ago

@andreaskoepf I have completed a working prototype for the rate-limiter. There is one part that I wish to clarify which is the limits on multiple keys portion:

AFAIK, the existing api_client represents any authenticated user that could be interacting with the service as both user(via website) or bots(via discord bots).

When you mention "limit on multiple keys, e.g. user and api_client", does it means that a database lookup will be performed to determine whether the given api key represents any signed-in users or just a bot?

kiritowu commented 1 year ago

@andreaskoepf Another part is for the OasstErrorCode, will TOO_MANY_REQUESTS = 429 be a suitable error code? (since that also is the HTTP error code for too many requests).

class OasstErrorCode(IntEnum):
    """
    Error codes of the Open-Assistant backend API.

    Ranges:
         0-1000: general errors
      1000-2000: tasks endpoint
      2000-3000: prompt_repository
    """

    # 0-1000: general errors
    GENERIC_ERROR = 0
    DATABASE_URI_NOT_SET = 1
    API_CLIENT_NOT_AUTHORIZED = 2
    SERVER_ERROR = 3
    TOO_MANY_REQUESTS = 429
yk commented 1 year ago

just my two cents, since andreas is blocked a bit:

kiritowu commented 1 year ago

@andreaskoepf & @GraemeHarris, backend rate limiting PoC is now completed and is up for PR.

A few To-Dos that I have identified:

Let me know what are your thoughts. peace out ✌️

andreaskoepf commented 1 year ago

I will close this as basic rate limiting is working efficiently and the existing code acts as nice example how to add rate limits in case that should be necessary.