ejfinneran / ratelimit

A Redis-backed rate limiter written in Ruby
MIT License
257 stars 55 forks source link

Implement threadsafe version of exec_within_threshold #33

Closed jamespeerless closed 5 years ago

jamespeerless commented 5 years ago

When using exec_within_threshold in an environment with multiple threads/processes you can run into an issue where multiple threads read the count for a given subject at the same time. If the read count is right below the current threshold then all threads will be allowed to execute before any of them can perform an add that pushes the count over the threshold.

For example, if you had three threads and a ratelimit with a threshold of 20 in 600 seconds and the current count was at 19. All three threads would read the current count of 19 and the exceeded? check used in exec_within_threshold would evaluate to false so all three threads would execute the block.

The fix requires a lock. Instead of implementing a lock from scratch I used an already existing implementation by mlanett. Each thread must acquire the lock before it can read the count and releases the lock after incrementing the count. This ensure that the threshold is never exceeded.

I wrote some tests to verify the issue and that this fixes it in a work project where we use the ratelimit gem.

The test that exposes the issue where exec_within_threshold allows more than the threshold number of executions in an interval is here: ` rlKey = SecureRandom.uuid rlSubject = SecureRandom.uuid ratelimit = Ratelimit.new(rlKey,{:redis => $redis})

  statusMap = { stop: 0, sleep: 0, run: 0, finished: 0}

  # the sleep is not required to get the bug to happen but it doesn't happen 100% of the time without it
  # and we don't want randomly failing tests
  threads = [
    Thread.new { ratelimit.exec_within_threshold(rlSubject, {interval: 10, threshold: 1 }) { sleep 0.5; ratelimit.add(rlSubject) }},
    Thread.new { ratelimit.exec_within_threshold(rlSubject, {interval: 10, threshold: 1 }) { sleep 0.5; ratelimit.add(rlSubject) }},
    Thread.new { ratelimit.exec_within_threshold(rlSubject, {interval: 10, threshold: 1 }) { sleep 0.5; ratelimit.add(rlSubject) }},
    Thread.new { ratelimit.exec_within_threshold(rlSubject, {interval: 10, threshold: 1 }) { sleep 0.5; ratelimit.add(rlSubject) }},
    Thread.new { ratelimit.exec_within_threshold(rlSubject, {interval: 10, threshold: 1 }) { sleep 0.5; ratelimit.add(rlSubject) }},
    Thread.new { ratelimit.exec_within_threshold(rlSubject, {interval: 10, threshold: 1 }) { sleep 0.5; ratelimit.add(rlSubject) }},
    Thread.new { ratelimit.exec_within_threshold(rlSubject, {interval: 10, threshold: 1 }) { sleep 0.5; ratelimit.add(rlSubject) }}
  ]

  sleep 1

  threads.each do |t| 
    status_symbol = t.status ? t.status.to_sym : :finished
    statusMap[status_symbol] += 1
  end

  expect(statusMap[:finished]).to be > 1

`

The test with the fixed implementation looks like this:

` rlKey = SecureRandom.uuid rlSubject = SecureRandom.uuid ratelimit = Ratelimit.new(rlKey, {:redis => $redis})

  statusMap = { stop: 0, sleep: 0, run: 0, finished: 0}
  threads = [
    Thread.new { ratelimit.exec_and_increment_within_threshold(rlSubject, {interval: 10, threshold: 1 })},
    Thread.new { ratelimit.exec_and_increment_within_threshold(rlSubject, {interval: 10, threshold: 1 })},
    Thread.new { ratelimit.exec_and_increment_within_threshold(rlSubject, {interval: 10, threshold: 1 })},
    Thread.new { ratelimit.exec_and_increment_within_threshold(rlSubject, {interval: 10, threshold: 1 })},
    Thread.new { ratelimit.exec_and_increment_within_threshold(rlSubject, {interval: 10, threshold: 1 })},
    Thread.new { ratelimit.exec_and_increment_within_threshold(rlSubject, {interval: 10, threshold: 1 })},
    Thread.new { ratelimit.exec_and_increment_within_threshold(rlSubject, {interval: 10, threshold: 1 })}
  ]

 sleep 1

  threads.each do |t| 
    status_symbol = t.status ? t.status.to_sym : :finished
    statusMap[status_symbol] += 1
  end

  expect(statusMap[:finished]).to eq(1)

`

coveralls commented 5 years ago

Coverage Status

Coverage decreased (-5.1%) to 93.284% when pulling 72eeab05387f329f82591a955189cb96ed90c457 on jamespeerless:threadsafe-exec-within-threshold into 0e60d3484b4b1f984f806b1c89c601adf92fd544 on ejfinneran:master.

jamespeerless commented 5 years ago

Reimplemented without the need for a lock.