chaps-io / gush

Fast and distributed workflow runner using ActiveJob and Redis
MIT License
1.04k stars 104 forks source link

Redis connection pool investigation #34

Closed cabello closed 6 years ago

cabello commented 8 years ago

We are trying to use Gush in production and we are constantly gettingthe number of connections exhausted.

If we run 2 workflows that fires ~5 workers each, everything run fine. If we run 3 workflows, then we hit the connections limit.

I wonder if there is a easy way to calculate how many connections are needed, If I try to run a few thousand workflows, do I need concurrency + constant factor, for example, 10 * 5 connections or do I need thousands connections + (concurrency * constant factor)?

pokonski commented 8 years ago

Hi @cabello! Thanks for reporting the issue, I didn't encounter it myself though I was running hundreds of workflows at the same time.

This sounds like a bug so I'll try to reproduce, can you share your redis settings?

pokonski commented 8 years ago

Alrighty, I think I got the solution. I modified the code to use ConnectionPool sidekiq uses so that should drastically reduce the number of Redis connections.

I released version 0.3.3, can you try and report back?

cabello commented 8 years ago

Hi @pokonski thanks for the quick fix, I am trying to use the gem and there is also a version 0.4.0 should 0.4.1 be released instead? https://rubygems.org/gems/gush/versions/0.4

pokonski commented 8 years ago

You are right, not sure how I missed the numbering, I'll release 0.4.1 :)

pokonski commented 8 years ago

@cabello 0.4.1 released, have a go!

peicodes commented 8 years ago

@pokonski Hey! Thanks for this change.

I noticed that the new version isn't on RubyGems yet: https://rubygems.org/gems/gush

pokonski commented 8 years ago

Duh, my bad. It is now :dash:

cabello commented 8 years ago

It's much better now we are still running on Redis connection limits, I plan on working on an example so we can investigate together soon.

pokonski commented 8 years ago

Great, I'd love to see an snippet I can reproduce and base our fixes on :)

cabello commented 8 years ago

I think I got a reasonable example, here it goes.

First stop your redis server and restart it with low client limit: redis-server --maxclients 50 then start sidekiq and gush.

Then build an example workflow like this one:

class FooWorkflow < Gush::Workflow
  def configure(client_id)
    client = Client.find_by(id: client_id)

    jobs = client.accounts.map do |account|
      egg_job = run EggJob, params: { account_id: account.id }
      run HamJob, params: { account_id: account.id }, after: egg_job

      egg_job
    end

    run BarJob, params: { client_id: client_id }, after: jobs
  end
end

Now with lots (a few thousand) of clients & accounts in the database, open a console and run:

Client.find_each do |client|
  FooWorkflow.new(client.id).start!
end

Gush will hit the connection limit very quickly. When I was running with no limit the max connections I saw was ~75. So my first impression is that it doesn't grow out of control, but it's currently hard to predict how many connections are necessary.

Hope this helps!

pokonski commented 8 years ago

Thanks for the detailed analysis! I'll have a deeper look into that :+1:

pokonski commented 7 years ago

I rechecked this case after recent changes and the maximum number of clients stops at around 33. Internally it uses more connection pooling than before for every Redis action. If you still can, are you able to recheck that with activejob branch?

Though the problem comes from running a lot of jobs which spawn separate connection pools, independently. So that is the biggest problem I see now.

pokonski commented 6 years ago

Version 1.0.0 decreases the number of operations during processing of workflows so should improve even more. Please open a new ticket if issue still exists.