engineyard / ey-cloud-recipes

A starter repo for custom chef recipes on EY's cloud platform. These are for reference, and do not indicate a supported status.
http://www.engineyard.com/products/cloud
Other
980 stars 291 forks source link

Updating Sidekiq Connection Pool #332

Closed thejbsmith closed 7 years ago

thejbsmith commented 7 years ago

Updating the sidekiq connection pool to take into account concurrency and number of workers when setting the connection pool.

Previously, the connection pool was set to the concurrency without any regard for the number of works. So, 8 works with a concurrecy of 25 woudl be 200 Sidekiq threads, but it would only have a connection pool of 25.

tpoland commented 7 years ago

I believe the connection pool with Sidekiq is per-worker already. The main issue is that when workers are starting in rapid succession ActiveRecord doesn't cycle the connection back through the pool fast enough to serve the next job. So a pool size of (concurrency 1.5) or (concurrency 2) be more appropriate for database connection counts.

thejbsmith commented 7 years ago

@tpoland, the connection pool with Sidekiq was only based on concurrency (take a look at the diff).

As an example, I currently have a utility server with a concurrency of 20, but with 8 works. The connection pool was being set to 20, causing connection issues since there can be 160 threads going at once.

The Sidekiq wiki mentions setting the pool to something close to or equal the number of threads. https://github.com/mperham/sidekiq/wiki/Advanced-Options#concurrency

I feel that concurrency * 2 would still be too low (at least in my situation with 8 workers). If you think that setting the pool to the number of threads is too much, what would be a good solution for that case?

tpoland commented 7 years ago

@thejbsmith so the issue is each worker maintains its own pool independent of the other workers. Your change uses the connection pool which multiplies the number of workers by the number of threads so you have the following scenarios:

Existing Model:

Proposed Revision:

Multiplier Model (threads*2):

Under your change you end up in a scenario where you aren't managing database connections effectively, and these do have a cost associated with them. You also quickly get to a position where you can allocate far more connections than your database could ever serve (MySQL's hard limit is 100,000 connections).

Any given thread will use at most one connection from the pool for one job. When work is complete that thread gets returned to the pool but isn't immediately available for work like the thread is, so the thread grabs a new one. What it boils down to, is how many individual jobs can your thread complete before its first connection gets back into the pool (ie. in a second or two).

If a multiplier of 2 doesn't work you could just increase the multiplier to 3 or 4. Additionally, the number of workers is usually going to be set based on instance size/type, so thats going to result in a variable multiplier anyway which makes it less of a consistent fit.

thejbsmith commented 7 years ago

Great explanation @tpoland. I was under the assumption that all workers used the same pool, not that each worker got it's own pool. Since that is the case, yes, I completely agree that workers * concurrency is could lead to an absurd number of db connections that will never be used.