brandonhilkert / sucker_punch

Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.
MIT License
2.65k stars 114 forks source link

Workers not finishing jobs #154

Closed nynhex closed 8 years ago

nynhex commented 8 years ago

I have a Rails 3.2.22 app with Sucker Punch 1.5.x installed and things seem to be working fine for small tasks. In one of my controllers I have a paging functionality which creates a Page (an sms message), saves it to AR, and emails/SMS the recipients.

This works fine when I'm sending pages out to 3-4 individuals but when I have several pages going out (ie to 10+ instances) the mail will be received but twilio messages don't always get delivered.

Can you look at my code to see how I can improve it without running out of DB pool connections? Btw, my db pool in dev and production is set to 10.

paging_controller

  def create
    message = params[:paging][:message]
    units_to_page_from_params(params[:paging]).each do |unit|
      @page = Paging.new(params[:paging])
      @page.unit_id = unit.id

      if @page.save
        PagingSmsJob.new.async.perform(@page.id)
        PagingMailerJob.new.async.perform(@page.id)
      end
    end

    respond_to do |format|
      format.html {redirect_to pagings_path, notice: "Page was successfully sent"}
      format.js {render "index", notice: "Page was successfully sent"}
    end
  end
private
  def units_to_page_from_params(params)
      units_to_be_notified = []

     if params[:region_id].present?
      region = Region.find_all_by_id(params[:region_id])
      units_to_be_notified.concat(region.flat_map{ |r| r.units.active})
    end

    if params[:unit_id].present?
      units = if params[:unit_id].index "all"
        Unit.active
      else
        Unit.find_all_by_id(params[:unit_id])
      end
      units_to_be_notified.concat(units)
    end
    units_to_be_notified.uniq
  end

paging_mailer_job.rb

class PagingMailerJob
  include SuckerPunch::Job
  workers 4

  def perform(paging_id)
    ActiveRecord::Base.connection_pool.with_connection do
      page = Paging.find(paging_id)
      PagingsMailer.paging(page.unit.incharge, page.unit, page.message).deliver
      PagingsMailer.paging(page.unit.attendant, page.unit, page.message).deliver
    end
  end
end

paging_sms_job.rb

class PagingSmsJob
  include SuckerPunch::Job
  workers 4

  def perform(paging_id)
    ActiveRecord::Base.connection_pool.with_connection do
      page = Paging.find(paging_id)
      setup_twilio
      @client.account.messages.create(
         :from => Company[:twilio_number],
         :to => page.unit.incharge.medic_phone,
         :body => "Message from dispatch: #{page.unit.unit_name} #{page.unit.incharge.medic_name} #{page.message}"
        )
      @client.account.messages.create(
        :from => Company[:twilio_number],
         :to => page.unit.attendant.medic_phone,
         :body => "Message from dispatch: #{page.unit.unit_name} #{page.unit.attendant.medic_name} #{page.message}"
        )
      end
    end

private

  def setup_twilio
    account_sid = Company[:twilio_sid]
    auth_token = Company[:twilio_token]
    @client = Twilio::REST::Client.new account_sid, auth_token
  end
end

Again, the mailer job will fire nice and fast and all mail gets delivered, but the SMS job gets about 70% of the pages when I select a high number of units to page.

This may be a stack question, but figured there might be some sort of tuning I can do to fix this.

nynhex commented 8 years ago

Also, this may be some sort of rate limiting issue with Twilio too. I haven't thought of that, so maybe you have an idea of where I should inject sleep in the perform method to give twilio time to catch up?

brandonhilkert commented 8 years ago

I can't speak to what the error might be with Twilio. One thought might be to log the response of the call to account.messages.create. Also, if you're using a error reporting service, see the README for the appropriate version to log exceptions thrown from threads. It'll let them bubble up, which doesn't seem like it'd be happening now.

re: pool. From those 2 jobs alone, assuming the queue is backfilled, that's 8 connections from the start, not including any that might be related to web processes like unicorn, or puma. Depending on your setup for the application server, you'd need a bunch more. I'd start with 25 and watch the exceptions for exhausting the connection pool and increase if it seems reasonable given the load.

nynhex commented 8 years ago

@brandonhilkert thanks so much for the good advice. I think the issue with Twilio is I'm hitting the API rate limit so I need to refactor so it'll throttle some sleep into the job. I can handle this.

You're totally right about the DB pool. I was at 10 but I bumped it up to 25 to see how it goes in staging and production. I'm running passenger so it's multi-process single thread, so I'm thinking I'm ok with 25. I'll let it run in staging and do some heavy testing, but I really appreciate the help. Did I tell you how much I :heart: sucker_punch? I'm typically a sidekiq fan but not having to worry about redis/technical debt is nice. Celluoid and your gem does a great job for basic things like what I'm doing (email and SMS). See you on twitter! :+1:

brandonhilkert commented 8 years ago

Glad it's working well. Cheers and thanks for the kind words!

On Sunday, January 17, 2016, James notifications@github.com wrote:

@brandonhilkert https://github.com/brandonhilkert thanks so much for the good advice. I think the issue with Twilio is I'm hitting the API rate limit so I need to refactor so it'll throttle some sleep into the job. I can handle this.

You're totally right about the DB pool. I was at 10 but I bumped it up to 25 to see how it goes in staging and production. I'm running passenger so it's multi-process single thread, so I'm thinking I'm ok with 25. I'll let it run in staging and do some heavy testing, but I really appreciate the help. Did I tell you how much I [image: :heart:] sucker_punch? I'm typically a sidekiq fan but not having to worry about redis/technical debt is nice. Celluoid and your gem does a great job for basic things like what I'm doing (email and SMS). See you on twitter! [image: :+1:]

— Reply to this email directly or view it on GitHub https://github.com/brandonhilkert/sucker_punch/issues/154#issuecomment-172365114 .


_Build a Ruby Gem is available! http://brandonhilkert.com/books/build-a-ruby-gem/?utm_source=gmail-sig&utm_medium=email&utm_campaign=gmail_

http://brandonhilkert.com

nynhex commented 8 years ago

@brandonhilkert oh and I figured out the twilio problem. I had set every number in my test database to my own cell phone number. I had done some heavy testing and received a carrier violation (rate limit from t-mobile) so messages were bouncing. Lesson learned, don't send 100 messages in a 2 minute span :+1: