brandonhilkert / sucker_punch

Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.
MIT License
2.65k stars 114 forks source link

Passenger problems #198

Closed ghost closed 7 years ago

ghost commented 7 years ago

So I have a Rail 4.2 app, it has 2 DBs, one for Rails, one for a load of data. I have a worker which inserts data into the data DB and uses the Postgres Notify mechanism to let the application know it is finished. I then use Sucker Punch to handle the notifications (send them to Pusher in fact). Works fine under Puma, not under Passenger. The stripped down code which exhibits the behaviour is

class UpdatesJob
  include SuckerPunch::Job

  def perform
    SecondDB::Base.connection_pool.with_connection do |conn|
      pgconn = conn.raw_connection
      pgconn.async_exec 'LISTEN events'
      begin
        loop do
          pgconn.wait_for_notify do |channel, pid, json|
            logger.warn('event')
          end
        end
      rescue Exception => e
        logger.warn('loop exception %s' % e.message)
      else
        logger.warn('loop exit without exception')
      ensure
        logger.warn('ensure UNLISTEN')
        pgconn.async_exec 'UNLISTEN events'
      end
    end
  end
end

I have an UpdateJob.perform_async in a initialiser.

The behaviour under passenger is the following

event
 : 
event
ensure UNLISTEN

About 12 events, 5 minutes, then an exception which is swallowed by the with_connection wrapper and drops me into the ensure. Removing the with_connection allows me to catch the exception, SSL corrupt response mac or some such, turn off SSL, server closed connection.

I have no idea what is going on here, this is on real hardware, not Heroku, Debian stable but repeatable on my Ubuntu laptop. I did see some comments in issues from 2013, but apparently Sucker Punch has obsoleted those now.

Any idea what would be happening here?

brandonhilkert commented 7 years ago

I don't. I'm not very familiar with Passenger to offer suggestions on what might be happening under the hood. SuckerPunch was made with the purpose of performing small, fast jobs. Keeping a job open like this for an extended period of time, while might work, just wasn't it's intended use case.

Stepping back, it feels like Passenger is closing a process or thread that contain this longer running job. It's possible their model doesn't work well with the library. I've not hear of other reports like this, but I also haven't heard of others doing longer running jobs.

If feels like this kind of logic might be better served with a stand-alone process. That way, you can to control the startup/shutdown on your own, without being sensitive to threads. If you want to stay with the threading approach, you might also trying creating your own threads in Ruby that does this listening. From the code above, there doesn't look to be anything SuckerPunch-specific that gives you an advantage. To me, that would be things like enqueueing many jobs over time. Given that this one stays running, the same could be done by starting a thread in the background on startup, similar to how error-reporting gems work when reporting caught exceptions.

Hope this helps! Sorry I couldn't offer more insight.

brandonhilkert commented 7 years ago

One other thought, would be there a way to use jobs instead of the DB pubsub for notification?

Somewhere else you must send the "finished" event to PG, so rather than push it to the DB, can you enqueue another SuckerPunch job that would have the listening code run? This would make the jobs smaller and more robust surviving across process restarts.

ghost commented 7 years ago

Thanks for those helpful remarks, this passenger issue kind-of blindsided me having run a stress test on Puma for 12+ hours with no issues; ho hum, back to the drawing board, I'll organise my own daemon for this.

Unfortunately enqueueing SP jobs is not an option since my "insertion program" is written in C :smile:

This clearly not an SP issue in any case, so closing; thanks again.

brandonhilkert commented 7 years ago

@jjgreen Ah, gotcha. I worked on a system that used Redis pubsub in a similar way and we had a longer-running process listening for those events coming from Redis that worked great. It was a bummer to have to go through the work, but definitely the more reliable route.

ghost commented 7 years ago

Just to close this off, the problem turned out to be passenger "smart spawning" which duplicated file descriptors, one must reestablish the DB connection in order that one process closing its connection does not close the other. Passenger handles this for the default ActionRecord connection, but not my secondary connection.

In any case, I followed good advice and moved the code into a daemon, which proved to be straightforward and only took a couple of hours.

brandonhilkert commented 7 years ago

Ah, awesome. Makes sense. Thanks for following up. I'm sure others have been bitten by this before.

Glad to hear the process-route went well for you.

On Sun, May 7, 2017 at 4:29 PM J.J. Green notifications@github.com wrote:

Just to close this off, the problem turned out to be passenger "smart spawning" which duplicated file descriptors, one must reestablish the DB connection in order that one process closing its connection does not close the other. Passenger handles this for the default ActionRecord connection, but not my secondary connection.

In any case, I followed good advice and moved the code into a daemon, which proved to be straightforward and only took a couple of hours.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/brandonhilkert/sucker_punch/issues/198#issuecomment-299732359, or mute the thread https://github.com/notifications/unsubscribe-auth/AAtbFE8nNpdxjnHzG7ZyLzezWZUeeI6Oks5r3imwgaJpZM4NN4b0 .

--


http://brandonhilkert.com