Closed cabello closed 6 years ago
Hi @cabello! Thanks for reporting the issue, I didn't encounter it myself though I was running hundreds of workflows at the same time.
This sounds like a bug so I'll try to reproduce, can you share your redis settings?
Alrighty, I think I got the solution. I modified the code to use ConnectionPool sidekiq uses so that should drastically reduce the number of Redis connections.
I released version 0.3.3
, can you try and report back?
Hi @pokonski thanks for the quick fix, I am trying to use the gem and there is also a version 0.4.0 should 0.4.1 be released instead? https://rubygems.org/gems/gush/versions/0.4
You are right, not sure how I missed the numbering, I'll release 0.4.1 :)
@cabello 0.4.1 released, have a go!
@pokonski Hey! Thanks for this change.
I noticed that the new version isn't on RubyGems yet: https://rubygems.org/gems/gush
Duh, my bad. It is now :dash:
It's much better now we are still running on Redis connection limits, I plan on working on an example so we can investigate together soon.
Great, I'd love to see an snippet I can reproduce and base our fixes on :)
I think I got a reasonable example, here it goes.
First stop your redis server and restart it with low client limit: redis-server --maxclients 50
then start sidekiq and gush.
Then build an example workflow like this one:
class FooWorkflow < Gush::Workflow
def configure(client_id)
client = Client.find_by(id: client_id)
jobs = client.accounts.map do |account|
egg_job = run EggJob, params: { account_id: account.id }
run HamJob, params: { account_id: account.id }, after: egg_job
egg_job
end
run BarJob, params: { client_id: client_id }, after: jobs
end
end
Now with lots (a few thousand) of clients & accounts in the database, open a console and run:
Client.find_each do |client|
FooWorkflow.new(client.id).start!
end
Gush will hit the connection limit very quickly. When I was running with no limit the max connections I saw was ~75. So my first impression is that it doesn't grow out of control, but it's currently hard to predict how many connections are necessary.
Hope this helps!
Thanks for the detailed analysis! I'll have a deeper look into that :+1:
I rechecked this case after recent changes and the maximum number of clients stops at around 33. Internally it uses more connection pooling than before for every Redis action. If you still can, are you able to recheck that with activejob
branch?
Though the problem comes from running a lot of jobs which spawn separate connection pools, independently. So that is the biggest problem I see now.
Version 1.0.0 decreases the number of operations during processing of workflows so should improve even more. Please open a new ticket if issue still exists.
We are trying to use Gush in production and we are constantly gettingthe number of connections exhausted.
If we run 2 workflows that fires ~5 workers each, everything run fine. If we run 3 workflows, then we hit the connections limit.
I wonder if there is a easy way to calculate how many connections are needed, If I try to run a few thousand workflows, do I need concurrency + constant factor, for example, 10 * 5 connections or do I need thousands connections + (concurrency * constant factor)?