ejfinneran / ratelimit

A Redis-backed rate limiter written in Ruby
MIT License
260 stars 57 forks source link

Inaccurate rate limit count when wrapping around bucket index #45

Closed phillipp closed 9 months ago

phillipp commented 1 year ago

The current implementation of the Ratelimit class uses a fixed number of buckets (bucket_count) to store rate-limiting data. When the bucket index wraps around, it may include old values in the count, leading to inaccurate rate limiting.

This issue occurs when there is no consistent adding to the counter. If the add method is not called for an extended period, the buckets for bucket + 1 and bucket + 2 are not deleted. As a result, when the bucket index wraps around, old values in these buckets are still present and are included in the count, causing unexpected behavior.

Proposed Solutions:

  1. Modify the count method to check if the queried buckets are expired based on their timestamp. Store a timestamp for each bucket when it is updated and compare it with the current time when fetching the count. This ensures that only unexpired bucket values are included in the count, even if the add method is not called consistently.

  2. Update the get_bucket method to use the timestamp directly instead of mapping it to a range of 0 to bucket_count. By doing so, you can avoid wrapping the bucket index and use a more straightforward approach to manage and remove old timestamp values. Modify the add and count methods accordingly to handle the new bucket indexing method and periodically remove expired keys from the Redis hashes.

These solutions aim to address the issue of inaccurate rate limiting due to wrapping around the bucket index and ensure consistent rate limiting even when the add method is not called regularly.

My suggestion would be to use 2. It would use longer keys in the hash, but would not add more keys. The pruning of old timestamps could by done in a redis lua script if necessary.

phillipp commented 1 year ago

I have sketched the following class (only updated methods are included) that would use "absolute" buckets and uses redis lua scripts to do all the heavy lifting:

class AbsoluteRateLimit < Ratelimit
  COUNT_LUA_SCRIPT = <<-LUA
    local subject = KEYS[1]
    local oldest_bucket = tonumber(ARGV[1])
    local current_bucket = tonumber(ARGV[2])
    local count = 0

    for bucket = oldest_bucket + 1, current_bucket do
      local value = redis.call('HGET', subject, tostring(bucket))
      if value then
        count = count + tonumber(value)
      end
    end

    return count
  LUA

  MAINTENANCE_LUA_SCRIPT = <<-LUA
    local subject = KEYS[1]
    local oldest_bucket = tonumber(ARGV[1])

    -- Delete expired keys
    local all_keys = redis.call('HKEYS', subject)
    for _, key in ipairs(all_keys) do
      local bucket_key = tonumber(key)
      if bucket_key < oldest_bucket then
        redis.call('HDEL', subject, tostring(bucket_key))
      end
    end
  LUA

  def add(subject, count = 1)
    bucket = get_bucket
    subject = "#{@key}:#{subject}"

    # Clenaup expired keys every 100th request
    cleanup_expired_keys(subject) if rand < 0.01

    redis.multi do |transaction|
      transaction.hincrby(subject, bucket, count)
      transaction.expire(subject, @bucket_expiry + @bucket_interval)
    end.first
  end

  def count(subject, interval)
    interval = [[interval, @bucket_interval].max, @bucket_span].min
    oldest_bucket = get_bucket(Time.now.to_i - interval)
    current_bucket = get_bucket
    subject = "#{@key}:#{subject}"

    redis.eval(COUNT_LUA_SCRIPT, [subject], [oldest_bucket, current_bucket])
  end

  def get_bucket(time = Time.now.to_i)
    (time / @bucket_interval).floor
  end

  def cleanup_expired_keys(subject)
    oldest_bucket = get_bucket(Time.now.to_i - @bucket_expiry)
    redis.eval(MAINTENANCE_LUA_SCRIPT, [subject], [oldest_bucket])
  end
end
krtschmr commented 1 year ago

sounds reasonable

phillipp commented 1 year ago

@andreasknoepfle If you're interested, I could provide a pull request with an additional SCRIPT LOAD so we don't need to transfer the lua scripts on each method call.

We use the code above in production, works fine.