brandonhilkert / sucker_punch

Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.
MIT License
2.64k stars 114 forks source link

Jobs not always running in rake task #227

Closed paul closed 3 years ago

paul commented 5 years ago

I'm using the sucker_punch ActiveJob queue adapter in a Rails app. I've created a job that bulk-fetches pricing data from an API and loads it into postgres via ActiveRecord models. I'm also (ab)using the job within a rake task so I can run several of them in parallel, since it takes awhile. I'd say about half the time, this works flawlessly, and the other half, the rake task "hangs", with the last thing in the log being Enqueued GetPricesJob (Job ID: 3a0197a2-71b9-49a7-a9d0-e1707d8de0f4) to SuckerPunch(default) with arguments:.

If I hit Ctrl-C in the rake task two times, then the rake task gets an Interrupt, and the jobs start to run and finish successfully. However, the main rake task db transaction gets rolled back due to the Interrupt.

My rake task looks like this, I've tried with and without the transaction and with_connection wrapper blocks:

task import: :environment do
    SuckerPunch.shutdown_timeout = 90
    ActiveRecord::Base.connection_pool.with_connection do
      Import.transaction do
      # do some database work
      ImportPricesJob.perform_later(some_ids)
    end
  end
end

Is this something that should be possible? Or am I breaking sucker_punch in an unsupported way?

Thanks!

brandonhilkert commented 5 years ago

I can't imagine why this wouldn't work. The one that thing comes to mind is if the job is enqueued and the rake task ends, the job is going to get cut off. So then they're left to put some kind of arbitrary sleep in, or implement some kind of latch that would know then the job is done.

It's hard for me to say here since I don't have a full picture of the logic.

There's nothing about sucker punch that would cause it to hang. It's likely some other condition like waiting for DB connection or some other i/o that's not available without a specified timeout.

paul commented 5 years ago

Yeah, it surprised me, too. I'm not doing anything special in the job, just using HTTP.rb to make requests and ActiveRecord to create records. I do have it wrapped in another with_connection block, though, as suggested in the README:

class ImportPricesJob < ApplicationJob
  def perform(ids)
    ActiveRecord::Base.connection_pool.with_connection do
      # HTTP.get(url_from_ids)
      # Price.import(data_from_responses)
    end
  end
end

The rake task only takes a few seconds to do its work, but then the Job can take 10s of seconds afterwards (which is why I added the shutdown_timeout = 90).

This is in a new Rails 6.0.0.rc1 app, so maybe spring or bootsnap or any other number of things is also trying to start threads and its getting confused.

When I hit Ctrl-C, the backtrace from the Interrupt exception looks like this:

Interrupt:
/home/rando/.gem/ruby/2.6.3/gems/activesupport-6.0.0.rc1/lib/active_support/concurrency/share_lock.rb:221:in `wait_for'
/home/rando/.gem/ruby/2.6.3/gems/activesupport-6.0.0.rc1/lib/active_support/concurrency/share_lock.rb:190:in `block in yield_shares'
/home/rando/.gem/ruby/2.6.3/gems/activesupport-6.0.0.rc1/lib/active_support/concurrency/share_lock.rb:189:in `yield_shares'
/home/rando/.gem/ruby/2.6.3/gems/activesupport-6.0.0.rc1/lib/active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/postgresql/database_statements.rb:89:in `block in execute'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract_adapter.rb:685:in `block (2 levels) in log'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract_adapter.rb:684:in `block in log'
/home/rando/.gem/ruby/2.6.3/gems/activesupport-6.0.0.rc1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract_adapter.rb:675:in `log'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/postgresql/database_statements.rb:88:in `execute'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/postgresql/database_statements.rb:163:in `exec_rollback_db_transaction'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract/database_statements.rb:332:in `rollback_db_transaction'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract/query_cache.rb:22:in `rollback_db_transaction'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract/transaction.rb:178:in `rollback'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract/transaction.rb:264:in `block in rollback_transaction'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract/transaction.rb:262:in `rollback_transaction'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract/transaction.rb:275:in `rescue in block in within_new_transaction'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract/transaction.rb:273:in `block in within_new_transaction'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract/transaction.rb:270:in `within_new_transaction'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/connection_adapters/abstract/database_statements.rb:277:in `transaction'
/home/rando/.gem/ruby/2.6.3/gems/activerecord-6.0.0.rc1/lib/active_record/transactions.rb:212:in `transaction'
myapp/lib/tasks/setup.rake:39:in `block (3 levels) in <main>'

So it does look like something is waiting for a connection. My AR pool size is the default of 5, and I haven't changed the number of suckerpunch workers from the default of 2, so I don't know what the rake task's connection is waiting for.

brandonhilkert commented 5 years ago

Definitely does look like something in Rails is waiting for the PG connection. I'm not familiar with that last few classes in that trace. Someone with more AR experience might be able to shed some light.

A few random ideas:

  1. Have you tried doing it without with_connection?
  2. Curious if this behaviors remains in stable version of Rails.
oshanz commented 4 years ago

I'm migration data from PostgreSQL to Elastic-search from sucker punch using a rake task. this is from the log file

Queued jobs didn't finish before shutdown_timeout...killing remaining jobs

oshanz commented 4 years ago

not sure how efficient or accurate this is but it seems to be working

loop do
  enqueued_count = SuckerPunch::Queue.stats.dig("ESUpdateJob", "jobs", "enqueued").to_i
  busy_count = SuckerPunch::Queue.stats.dig("ESUpdateJob", "workers", "busy").to_i
  break if enqueued_count.zero? && busy_count.zero?
end
puts "Queue is empty"

so the main thread from rake task is hanging there watching for queue stats.

K-S-A commented 3 years ago

Hi @paul, we used to have similar issue. When dealing with async processing, that requires some external connections, it is necessary to properly terminate ones on exit.

Here is an example for RabbitMQ connection: https://github.com/cimon-io/thumper/blob/master/lib/tasks/messagebroker.rake

paul commented 3 years ago

@K-S-A Thanks for the update. I'm not working on this project any more, so I don't have any way to test it, but it might be useful to others with the same problem who come looking!