Closed andreaskoepf closed 1 year ago
@andreaskoepf Happy to pick this up if free - will add some local load testing and report in the PR
The following are my two-cents after reading through the source code and documentations for slowapi and fastapi-limiter:
Since In-built Memory and MongoDB is eliminated due to scalability issue and relative latency (as compared to memory-based alternatives), the two storage options left are Redis vs Memcached in which Redis seems to be a better alternative due to its Replication capabilities and Lua scripting.
One limitation of slowapi is that the request
argument must be explicitly stated and hinted as fastapi.Request
in the endpoints parameters like this.
@limiter.limit("5/minute")
async def myendpoint(request: Request)
pass
After reviewing the existing backend code, the request is now type hinted as sub-classes from pydantic.BaseModel
to be benefited from fastapi's automated body conversion. However, this will run into problem with slowapi's syntax compatibility since it checks whether is the request
argument a subclass for starlette.Request
in these few lines.
Solving this issue will require some changes to the naming convention of all our current routes like this:
@router.post("/", response_model=protocol_schema.AnyTask) # work with Union once more types are added
def request_task(
*,
db: Session = Depends(deps.get_db),
api_key: APIKey = Depends(deps.get_api_key),
body: protocol_schema.TaskRequest,
request: Request
) -> Any:
pass
In contrast, fastapi-limiter's usage is relatively straightforward and more compatible with current code base
import redis.asyncio as redis
import uvicorn
from fastapi import Depends, FastAPI
from fastapi_limiter import FastAPILimiter
from fastapi_limiter.depends import RateLimiter
app = FastAPI()
@app.on_event("startup")
async def startup():
redis = redis.from_url("redis://localhost", encoding="utf-8", decode_responses=True)
await FastAPILimiter.init(redis)
@app.get("/", dependencies=[Depends(RateLimiter(times=2, seconds=5))])
async def index():
return {"msg": "Hello World"}
if __name__ == "__main__":
uvicorn.run("main:app", debug=True, reload=True)
Slowapi provide support of shared rate-limits across multiple endpoints whereas fastapi-limiter's way of redis key definition makes it impossible to have shared limits across multiple endpoints.
However, in our current use-case, it seems like shared limits is not a must-have feature.
Both allow limits on multiple keys with different thresholds for a single endpoint, whereby one need to configure by the key_func
method for slowapi or identifier
for fastapi-limter parameter
Both slowapi and fastapi-limiter offers very similar functionalities (and arguably performance since both uses lua script with redis), my personal preference is fastapi-limiter since the syntax is cleaner and easier to implement provided redis is used as the storage backend and shared limit across endpoint is not a required feature. Hope this is helpful 😄
@GraemeHarris Thanks for offering help with testing, load test code would be appreciated!
@kiritowu Thanks for the detailed comparison. We will follow your recommendation and use fastapi-limiter. If you want to work on this, could you first please submit a separate PR which adds redis to our docker-compose.yaml.
Sure, so there will be around ~3 PR for this issue:
@GraemeHarris have you started any code on this? If yes, then few free to take this issue as it is around midnight in my timezone, probably only have time to work on this like 10 hours later.
@kiritowu thanks for the write-up :). I had only done some of the initial research - but I think I can get started on the redis in-docker-compose step if that works for you? I'll update here on if I get to the next steps. I am free for my evening now. (EU-timezone)
Sgtm!
👍 Thanks for peer-coordination!
@kiritowu https://github.com/LAION-AI/Open-Assistant/pull/187 Basic updates for redis - the containers spin up alright on my local so hopefully helps set you to run when you're back in
@andreaskoepf I have completed a working prototype for the rate-limiter. There is one part that I wish to clarify which is the limits on multiple keys portion:
AFAIK, the existing api_client represents any authenticated user that could be interacting with the service as both user(via website) or bots(via discord bots).
When you mention "limit on multiple keys, e.g. user and api_client", does it means that a database lookup will be performed to determine whether the given api key represents any signed-in users or just a bot?
@andreaskoepf Another part is for the OasstErrorCode
, will TOO_MANY_REQUESTS = 429
be a suitable error code? (since that also is the HTTP error code for too many requests).
class OasstErrorCode(IntEnum):
"""
Error codes of the Open-Assistant backend API.
Ranges:
0-1000: general errors
1000-2000: tasks endpoint
2000-3000: prompt_repository
"""
# 0-1000: general errors
GENERIC_ERROR = 0
DATABASE_URI_NOT_SET = 1
API_CLIENT_NOT_AUTHORIZED = 2
SERVER_ERROR = 3
TOO_MANY_REQUESTS = 429
just my two cents, since andreas is blocked a bit:
@andreaskoepf & @GraemeHarris, backend rate limiting PoC is now completed and is up for PR.
A few To-Dos that I have identified:
Let me know what are your thoughts. peace out ✌️
I will close this as basic rate limiting is working efficiently and the existing code acts as nice example how to add rate limits in case that should be necessary.
Add a throttling system to the backend REST endpoints that tracks and limits the interaction frequency of users/api-clients. This will help us to prevent or slow down users/automated systems form flooding our database. Preferably we would like to limit on multiple keys, e.g. user and api_client (each with different thresholds).