brandonhilkert / sucker_punch

Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.
MIT License
2.65k stars 114 forks source link

Job being killed by Passenger? or Deadlock ? #67

Closed david-boyd closed 10 years ago

david-boyd commented 10 years ago

Hi All, Wondering if anyone has seen this problem. I have a long running sucker_punch job ( takes around 10 hours) it's an overnight batch. I am running on Phusion Passenger with (I think 3 worker threads) status from passenger-status ----------- General information ----------- max = 3 count = 0 active = 0 inactive = 0 Waiting on global queue: 0

My sucker_punch job is executed async, as part of the job it executes other async smaller sucker_punch jobs ( take around 30 seconds)

I cannot exactly determine what is going on, but 'sometimes' my long running job just dies or seems to halt. I did added some debug code around the entire sucker_punch job

begin rescue Exception => e logger.error(e) raise e

However didn't see an exception, So assuming my long running sucker_punch is being halted rather than killed? Or potential some sort of deadlock?

The interesting part of this. Sometimes my long running job works fine, and sometimes it doesn't.

david-boyd commented 10 years ago

FYI I am using Passenger 3 on Rails 3.2 with sucker_punch gem version 1.1

brandonhilkert commented 10 years ago

Where is the app hosted?

david-boyd commented 10 years ago

Engine Yard (small AWS instance) https://support.cloud.engineyard.com/entries/23852283-Worker-Allocation-on-Engine-Yard-Cloud

david-boyd commented 10 years ago

Passenger worker threads might be a false lead as it doesn't seem like active worker threads change when my job executes.

brandonhilkert commented 10 years ago

I'm not sure I can be much help. Heroku is known to restart processes once every 24 hours. In general, I've stated that Sucker Punch is not fit for long running jobs, or jobs where completion is super important (http://brandonhilkert.com/blog/why-i-wrote-the-sucker-punch-gem/#comment-1286523284).

I'm not sure I can speak to exactly why your thread is being killed in your ENV. It's tremendously dependent on your architecture and I don't have any experience with your particular setup.

Perhaps the Passenger list might offer some insight?