Description

Develop a rate-limiting mechanism to control the number of requests each user can make based on their account type. FREEMIUM accounts should be limited to 5 requests per minute (RPM), while PREMIUM accounts are allowed up to 50 RPM. If the limit is exceeded, the system should reject additional requests and respond with an HTTP 429 Too Many Requests status until the rate limit resets.

User Stories

As a FREEMIUM User, I want to know when I’ve reached my request limit, so I understand when to wait before making more requests.
As a PREMIUM User, I want higher request limits, so I can access the service more frequently.
As a Developer, I want to control the rate of requests based on user type, ensuring fair resource usage and preventing overloading.

Details

Objective: Implement a rate limiter that dynamically applies limits based on account type, returning an error message if a user exceeds the allowed RPM.
Requirements:
- Define rate limits: 5 RPM for FREEMIUM accounts, 50 RPM for PREMIUM accounts.
- Implement Redis or an in-memory cache to track and manage user request counts.
- Apply rate limiting to all authenticated endpoints, specifically /predict-similarity.
- Return an HTTP 429 Too Many Requests response with a message if the limit is exceeded.

Example Usage and Responses

Request:
- FREEMIUM user makes a sixth request within one minute.

Response:

{
"error": "Rate limit exceeded. Please wait before making additional requests."
}

Status Code: 429 Too Many Requests

Implementation Steps

Configure Rate Limiting Rules:
- Set FREEMIUM limit to 5 RPM and PREMIUM limit to 50 RPM.
- Store the limits in a configuration file or environment variable for easy adjustment.
Implement Rate Limiter Using Redis:
- Use Redisto store user request counts and timestamps, expiring entries after one minute to reset the count.
- For each request, check the user’s current count in Redis.
  - If the count is below the limit, increment it and allow the request.
  - If the count meets the limit, return a 429 Too Many Requests response.
Apply Rate Limiting Middleware:
- Implement middleware or dependency in FastAPI to enforce rate limiting on all authenticated endpoints, specifically /predict-similarity.
- Use dependency injection to check user account type (FREEMIUM or PREMIUM) and enforce the corresponding rate limit.
Handle Responses and Errors:
- If a request is blocked due to rate limits, respond with:
  - HTTP 429 Too Many Requests
  - JSON message explaining the limit has been reached and advising the user to wait.

Code Mockup

Here’s a simplified example using Redis to track requests.

from fastapi import APIRouter, HTTPException, Depends
from redis import Redis
from datetime import datetime, timedelta

# Redis configuration
redis_client = Redis(host='localhost', port=6379, db=0)

# Rate limits
RATE_LIMITS = {
    "FREEMIUM": 5,
    "PREMIUM": 50
}

# Rate limit function
def rate_limit(user_id: str, account_type: str):
    key = f"rate_limit:{user_id}"
    requests = redis_client.get(key)

    if requests is None:
        # Set initial count if not already present
        redis_client.setex(key, timedelta(minutes=1), 1)
    elif int(requests) < RATE_LIMITS[account_type]:
        # Increment request count
        redis_client.incr(key)
    else:
        # Limit exceeded
        raise HTTPException(
            status_code=429, detail="Rate limit exceeded. Please wait before making additional requests."
        )

# Example usage in endpoint
@router.get("/service")
async def predict_similarity_endpoint(user_id: str, account_type: str):
    rate_limit(user_id, account_type)
    # Logic for similarity prediction here
    return {"message": "Prediction result"}

Edge Cases

Simultaneous Requests: Ensure race conditions are avoided by using Redis’ atomic increment functionality.
Network Delays or Retries: Handle cases where users may unintentionally exceed the limit due to network delays or retries.
Account Upgrades: Ensure rate limits reflect changes if a user upgrades from FREEMIUM to PREMIUM mid-session.

GabrielEValenzuela / chatML

Implement request rate limiting based on account type #9