brandonhilkert / sucker_punch

Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.
MIT License
2.65k stars 114 forks source link

ActiveRecord::ConnectionNotEstablished Error #168

Closed dashbitla closed 8 years ago

dashbitla commented 8 years ago

We are using sucker_punch latest 2.0 Gem for SMS processing! It fails with ActiveRecord::ConnectionNotEstablished error as given below.

ActiveRecord::ConnectionNotEstablished ActiveRecord::ConnectionNotEstablished
/bts-tips/shared/bundle/ruby/1.9.1/gems/activerecord-3.2.11/lib/active_record/connection_adapters/abstract/connection_specification.rb:167:in `connection_pool'
/bts-tips/releases/20160309212826/app/jobs/sms_push_non_transactional_job.rb:7:in `perform'
/bts-tips/shared/bundle/ruby/1.9.1/gems/sucker_punch-2.0.1/lib/sucker_punch/job.rb:57:in `__run_perform'
/bts-tips/shared/bundle/ruby/1.9.1/gems/sucker_punch-2.0.1/lib/sucker_punch/job.rb:43:in `block in perform_in'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/safe_task_executor.rb:24:in `call'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/safe_task_executor.rb:24:in `block in execute'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/synchronization/mri_lockable_object.rb:62:in `block in synchronize'
/home/dashdeploybitla/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/monitor.rb:211:in `on_synchronize'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/synchronization/mri_lockable_object.rb:62:in `synchronize'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/safe_task_executor.rb:19:in `execute'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/ivar.rb:170:in `safe_execute'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/scheduled_task.rb:285:in `process_task'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/timer_set.rb:157:in `block (2 levels) in process_tasks'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/ruby_thread_pool_executor.rb:348:in `call'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/ruby_thread_pool_executor.rb:348:in `run_task'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/ruby_thread_pool_executor.rb:337:in `block (3 levels) in create_worker'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `loop'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `block (2 levels) in create_worker'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `catch'
/bts-tips/shared/bundle/ruby/1.9.1/gems/concurrent-ruby-1.0.1/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `block in create_worker'

database.yml configuration:

production:
  host: localhost
  adapter: mysql2
  encoding: utf8
  reconnect: false
  database: bts-db
  pool: 20
  username: btsdbuser
  password: btsdbpwd
  socket: /var/run/mysqld/mysqld.sock

sms_db:
  host: localhost
  adapter: mysql2
  encoding: utf8
  reconnect: false
  database: sms_db
  pool: 20
  username: btsdbuser
  password: btsdbpwd
  socket: /var/run/mysqld/mysqld.sock

Here is the Code:

# NonTransactional SMS Pusher
class SmsPushNonTransactionalJob 
  include SuckerPunch::Job    
  workers 4

  def perform(sms_p_id)
    ActiveRecord::Base.connection_pool.with_connection do
      SmsPusher.find(sms_p_id).process
    end
  end

end

The above code pulls the SMS object details from the database and calls an HTTP API to send the SMS and updates an SMS Database Table which is in a different database on the same database server.

The application is running under Passenger 4.0 version with NGinx.

This happens for about 10% of the Jobs and its also causing the effect on main application actions to have the same Connection issue.

Never had this issue before.

By the way, we have about 6 different sucker_punch jobs that does different things.

Any thoughts on whats going on?

brandonhilkert commented 8 years ago

Look like that error is thrown from here, among other places.

If your pool is 20, and you have 6 different jobs, is it possible, all connections are being check out? Usually when that's a case, you get a different error that says it can't establish a connection in a certain amount of time.

Also, you might want to check the number of connections in use as your application is running to ensure they're being checked back in.

dashbitla commented 8 years ago

Tried with pool size as 100 also. All other Jobs set their workers limit to just 2, the default. So total connections if we really see it, it should not be more than 14 connections for these workers - even if all of them running concurrently.

Is it to do with multiple database connections where SMS is picked form one database, pushed via HTTP API Call and then updated in an other SMS common database as specified in the database.yml?

OR Is it to do with Passenger by any chance? Is there something need to be configured for this?

brandonhilkert commented 8 years ago

Honestly, I'm not sure. It sounds application-specific. I wish I could give you more guidance, but I've not come across this error before. Definitely sounds like you're headed down the right path with the DB configuration though.


http://brandonhilkert.com

On Tue, Mar 15, 2016 at 4:38 PM, Dash Bitla notifications@github.com wrote:

Tried with pool size as 100 also. All other Jobs set their workers limit to just 2, the default. So total connections if we really see it, it should not be more than 14 connections for these workers - even if all of them running concurrently.

Is it to do with multiple database connections where SMS is picked form one database, pushed via HTTP API Call and then updated in an other SMS common database as specified in the database.yml?

OR Is it to do with Passenger by any chance? Is there something need to be configured for this?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/brandonhilkert/sucker_punch/issues/168#issuecomment-197010533

dashbitla commented 8 years ago

Is it to do with Passenger configuration by any chance? Is there something need to be configured for the Passenger environment?

brandonhilkert commented 8 years ago

I haven't used Passenger in a long time, so I don't know if that has something to do with it. Perhaps you could try Unicorn/Puma and see if it the problem persists.

dashbitla commented 8 years ago

When we are running the application under Passenger with 40 processes and 10 servers, thats like 400 processes capacity for these workers to utilize.

The simplest solution for Passenger or any multi-process based environments with multiple processes running, we can limit the workers to 1 or 2 is good enough I feel. This works for us, as we use all the Passenger processes across all the servers to handle sucker_punch jobs!

In our case - I just reduced workers per job to 1 for less frequent jobs and 2 for frequent jobs and things are perfectly back on track. In our case it means, a Job can run as many as 400 workers.

Also I have factored the Pool Size and also MySQL max_connections to support these connections.

Thanks for all the clarifications Brandon.

Thanks for the Awesome sucker_punch Gem. Really LOVE :heart: it.

_Moving away from Sidekiq & Redis and good saving on Workers servers and better utilization of the App Servers. Planning to switching to MemCached for Cache and try to leverage MemCached direct from MySQL 5.6. A BIG THANK YOU :heart: :green_heart: _

brandonhilkert commented 8 years ago

:+1: Happy to help. Good luck :)